t-houssian / fillpdf

A python library to make filling pdfs much easier
MIT License
134 stars 23 forks source link

Stream output_pdf_path to AWS S3 #43

Closed cmh234 closed 1 year ago

cmh234 commented 1 year ago

Hi I can read the PDF from AWS S3 like this.

    bucket = "advance-directive"
    key = "assets/CA.test.pdf"
    key2 = "assets/CA.test.2.pdf"

    s3 = boto3.client('s3',
                        region_name='....',
                        aws_access_key_id='...',
                        aws_secret_access_key="....")

    bytes_buffer = io.BytesIO()
    s3.download_fileobj(Bucket=bucket, Key=key, Fileobj=bytes_buffer)
    byte_value = bytes_buffer.getvalue()

    # reader = PdfReader(BytesIO(byte_value))
    fillpdfs.get_form_fields(BytesIO(byte_value), sort=False, page_number=None)
For some apps, I save to S3 like this.
        with BytesIO() as bytes_stream:
        writer.write(bytes_stream)
        bytes_stream.seek(0)
        s3.put_object(Body=bytes_stream, Bucket=bucket, Key=key2, ContentType='application/pdf')

How do I get to output byte data that I can put on AWS?

fillpdfs.write_fillable_pdf(BytesIO(byte_value), output_pdf_path, {'name': 'my name'}, flatten=False)

cmh234 commented 1 year ago

I found a solution

To save your altered PDF to memory in an object that can be passed around (instead of writing to a file), simply create an empty instance of io.BytesIO:

from io import BytesIO

new_bytes_object = BytesIO() Then, use fillpdfs.write_fillable_pdf(BytesIO(byte_value), new_bytes_object, {'name': 'my name'}, flatten=False) Before you attempt to read from new_bytes_object, don't forget to seek(0) back to the beginning, or rewind it. Otherwise, the object seems empty.

new_bytes_object.seek(0) s3.put_object(Body=new_bytes_object, Bucket=bucket, Key=key2, ContentType='application/pdf')

This works because io.BytesIO objects act like a file object, also known as a file-like object. It and related classes like io.StringIO behave like files in memory

Retrieved from: https://stackoverflow.com/questions/68985391/writing-a-python-pdfrw-pdfreader-object-to-an-array-of-bytes-filestream