jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/
Other
3.37k stars 148 forks source link

Borb Incompatible with the Multiprocessing Library #185

Closed DrPlanecraft closed 7 months ago

DrPlanecraft commented 10 months ago

Describe the bug With a multi-processed application, the remote process (the process spawned by multiprocessing) will not be able to save and otherwise be encoded to be sent back to the main process

To Reproduce Download file: Artwork_1.pdf

Run:

import random as r
from borb.pdf import PDF
from multiprocessing import Pool

def MultiProcessingObj(args):
    cnt = 0
    while cnt <= args[1]:
        with open(f"Artwork_output_{cnt}.pdf","wb") as file:
            PDF.dumps(file, args[0])
        cnt += 1
        print(f"incremented")
    return args[0]

if __name__ == "__main__":
    try:
        with open("test/Artwork_1.pdf","rb") as f:
            pdfObj = PDF.loads(f)

        genSet = ((pdfObj,r.randint(1,10)) for _ in range(r.randint(10,100)))
        with Pool(processes=2) as pool:
            result = pool.imap_unordered(
                func=MultiProcessingObj,
                iterable=genSet
            )

            for i in result:
                print(i)

        with open(f"Artwork_output_final.pdf","wb") as file:
            PDF.dumps(file,pdfObj)

    except Exception as e:
        print(str(e.with_traceback()))

Expected behaviour It should create multiple PDFs in the current working directory ( os.getcwd() )

Traceback

Traceback (most recent call last):
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\proofreaderOps.py", line 472, in findOnPDF
    PDF.dumps(buffer,self.artwork)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\pdf\pdf.py", line 64, in dumps
    WriteAnyObjectTransformer().transform(
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\document\document_transformer.py", line 125, in transform
    self.get_root_transformer().transform(object_to_transform["XRef"], context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\reference\xref_transformer.py", line 142, in transform
    self.get_root_transformer().transform(object_to_transform["Trailer"]["Root"], context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\document\catalog_transformer.py", line 109, in transform
    return super(CatalogTransformer, self).transform(object_to_transform, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\object\dictionary_transformer.py", line 122, in transform
    self.get_root_transformer().transform(e, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\page\pages_transformer.py", line 71, in transform
    self.get_root_transformer().transform(p, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\page\page_transformer.py", line 78, in transform
    super(PageTransformer, self).transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\object\dictionary_transformer.py", line 122, in transform
    self.get_root_transformer().transform(e, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\object\dictionary_transformer.py", line 122, in transform
    self.get_root_transformer().transform(e, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\any_object_transformer.py", line 107, in transform
    super().transform(object_to_transform, context)
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\transformer.py", line 251, in transform
    return_value = h.transform(
                   ^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\borb\io\write\object\dictionary_transformer.py", line 86, in transform
    ) and not v.is_inline():
              ^^^^^^^^^^^
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\PIL\Image.py", line 529, in __getattr__
    raise AttributeError(name)
AttributeError: is_inline
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\flask\app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\flask\app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\HumanKind API\flask_main.py", line 195, in ProofreaderPDF
    for Pdf, doc in results:
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\multiprocessing\pool.py", line 873, in next
    raise value
AttributeError: is_inline

Desktop (please complete the following information):

jorisschellekens commented 7 months ago

For some reason, the Image is being copied. And Image doesn't naturally have the methods needed to work in the borb IO framework. So there is a special method that adds these methods to an Image (any random object really). But copying the object doesn't add these methods. Which is where it goes wrong. is_inline is one of these magic methods.

jorisschellekens commented 7 months ago

When I try to reproduce this issue with another PDF, the stacktrace now points to a part of fonttools:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
  File "/home/joris/PycharmProjects/borb-sandbox-feb-8-2024/.venv/lib/python3.10/site-packages/fontTools/afmLib.py", line 354, in __getattr__
    if attr in self._attrs:
TypeError: argument of type 'NoneType' is not iterable

I have no desire to re-implement fonttools. And if I want to continue using that library, it means I am limited by whatever limits they happen to have.

In short, it seems like even if I solved the multi-threading issue in borb, it would still not be enough. One of the libraries on which borb depends happens to not be thread-safe as well.

So, for now, I am going to close this issue.

Kind regards, Joris Schellekens