reingart / pyfpdf

Simple PDF generation for Python (FPDF PHP port)
https://code.google.com/p/pyfpdf/
GNU Lesser General Public License v3.0
858 stars 528 forks source link

Added Support for BytesIO as alternate to sending filename and added support to accept a previously processed image data to avoid reprocessing #187

Open hemanshukale opened 3 years ago

hemanshukale commented 3 years ago

BytesIO instance can now be sent instead of filename Image data, once processed, can be stored and reused, even in separate instances of FPDF

Please find more details and code example in subsequent comments

hemanshukale commented 3 years ago

BytesIO addition (bd19403)

I was working on images created within a script itself hence needed to pass images without storing in fs, hence added BytesIO support to pass the image without saving

Usage:

f=FPDF()
f.add_page()
buffer = io.BytesIO()
<PIL.Image instance>.save(buffer, format='PNG') # or format='JPEG'
ioBuffer = io.BytesIO(buf.getvalue())
f.image(ioBuffer,x,y,w,h,type="png",sub_type='tounique') # new subtype will make sure this instance is stored 

Reasoning for adding sub_type=tounique

There is a dictionary FPDF.images which stores the processed data with the name taken from name argument of FPDF.image() as key. If you pass BytesIO instance in the parameter, its address can get repeated, and the script will see it as same data sent again, so will not process the new data. Adding tounique as param will append the index number to the name param to make a unique key

hemanshukale commented 3 years ago

Reusing processed image data (a35c701)

In another use case, I needed to put same image multiple times in multiple pdfs. If I used the BytesIO instace / filename, it will be reused for an instance of FPDF and not reprocessed for that instance, however it will be processed once per new PDF. To avoid this reprocessing. we can store the dictionary made by processing data once and send this to FPDF.image() every other time.

Usage :

f=FPDF()
processedDict = f._parsepng(ioBuffer) # this function will return the dict
processedDictHash = FPDF.s256(processedDict) # get the hash of the processed dict data

First=True # This will represent if this is first time the image function is called
for (condition):
    if First:
        f.image(processedDict,x,y,w,h,type='png',sub_type='tohash') # First time you send the whole processed dict 
        First=False
    else:
        f.image(processedDictHash,x,y,w,h,type='png') # every other time you just send hash of the processed dict

Reasoning for adding sub_type=tohash

However this might need sending the same dict (size of which can go in MBs) and the FPDF.imge() function will always check is this is already present in the dict. Hence, when the argument tohash is passed, the key used for storing this processed data will be made from hash of the processed dictionary. So every time we need to use the same image,(except first time per FPDF instance) we can just send hash of the dict and the script will compare only that

If in case you face some random issues, you can try sending a copy.deepcopy of the processedDict or processedDictHash instead of the original variable