Open bvandermeersch opened 4 years ago
I had to use AWS Lambda Python3.6 to make this work.
Python3.8 Runtime for AWS Lambda does not include curl, tar or zip any more among other packages.
While I think I got to the same spot where you got. I changed the /opt/instdir/program/soffice to /opt/instdir/program/soffice.bin but then I got
sh: instdir/program/soffice.bin: Permission denied
While this is probably laughable here is my code:
import boto3
import os
s3_bucket = boto3.resource("s3").Bucket("************")
convertCommand = "instdir/program/soffice.bin --headless --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp"
client = boto3.client('s3')
resource = boto3.resource('s3')
def download_dir(client, resource, dist, local='/tmp', bucket='your_bucket'):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
if result.get('CommonPrefixes') is not None:
for subdir in result.get('CommonPrefixes'):
download_dir(client, resource, subdir.get('Prefix'), local, bucket)
for file in result.get('Contents', []):
dest_pathname = os.path.join(local, file.get('Key'))
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
resource.meta.client.download_file(bucket, file.get('Key'), dest_pathname)
def lambda_handler(event,context):
print("Starting Process")
print("Starting Download")
download_dir(client, resource, 'instdir/', '/tmp', bucket='********')
print("Download Complete")
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print("Starting Conversion")
print(os.system("cd /tmp && ls"))
print(os.system("cd /tmp/instdir && ls"))
print(os.system("cd /tmp/instdir/program && ls"))
# Execute libreoffice to convert input file
os.system(f"cd /tmp && sudo {convertCommand} {key}")
print("Conversion Complete")
# Save converted object in S3
print("Starting Save")
outputFileName, _ = os.path.splitext(key)
outputFileName = outputFileName + ".pdf"
f = open(f"/tmp/{outputFileName}","rb")
s3_bucket.put_object(Key=outputFileName,Body=f,ACL="private")
print("Saving Complete")
f.close()
I was not able to figure anything about how to get around the missing curl, tar and other dependencies so I decompress the file and uploaded it to an s3 bucket. I work with a company that has to have relatively not dependencies because we work with sensitive data all the time. So I went through all the steps on here but have hit a snag with the 3.8 solution. Guess I will have to settle for the 3.6 solution for now.
So while I could have worked with the NPM module to get tar and brotli to decompress the file I decided to to decompress it locally on my machine (using peazip) and upload just a zip file of the file to a different S3 bucket from the drop buck and the output bucket.
Note: The zip file decompression is handled in memory and not saved to the system until extractall("/tmp") so you may need to allocate roughly 200-300 more memory to this function if you are under the max. I used 1600 for this code example.
Below I have working python 3.8 code:
import boto3
import os
from zipfile import ZipFile
from io import BytesIO
s3_bucket = boto3.resource("s3").Bucket("************-output") #output bucket
zip_obj = boto3.resource("s3").Object(bucket_name="*********-pdf", key="instdir.zip") #bucket that has your zip file in
buffer = BytesIO(zip_obj.get()["Body"].read())
z = ZipFile(buffer)
z.extractall("/tmp")
convertCommand = "instdir/program/soffice.bin --headless --norestore --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --convert-to 'pdf:writer_pdf_Export' --outdir /tmp"
resource = boto3.resource('s3')
def lambda_handler(event,context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key'].replace("+"," ")
# Execute libreoffice to convert input file
print("Elevating Permissions")
os.system("chmod u+x /tmp/instdir/program/soffice.bin")
print("Permissions Elevated")
print("Downloading File")
resource.meta.client.download_file(bucket, f"{key}", f"/tmp/{key}")
print("File Downloaded")
print("Starting Conversion")
#not sure why you have to run this twice but it works on the second one consistently
os.system(f"cd /tmp && {convertCommand} '{key}'")
os.system(f"cd /tmp && {convertCommand} '{key}'")
print("Conversion Complete")
# Save converted object in S3
print("Starting Save")
outputFileName, path = os.path.splitext(key)
outputFileName = outputFileName + ".pdf"
f = open(f"/tmp/{outputFileName}","rb")
s3_bucket.put_object(Key=outputFileName,Body=f,ACL="private")
print("Saving Complete")
f.close()
Seems the python example no longer works that is located here:
https://github.com/vladgolubev/serverless-libreoffice/blob/master/STEP_BY_STEP.md
Seems tar/zip is no longer available in amazon linux 2 python3.8
Ive also attempted to make it a layer, but I get this error when running it:
sh: /opt/instdir/program/soffice: No such file or directory
Even though it clearly is there.
Anyone else get this working?