Closed player1024 closed 8 months ago
Hi @player1024 - Please share your code. Parallelizing this should be similar to parallelizing any IO task. I think it will be better to create a separate LayoutPDFReader instance for each thread rather than reuse the same one.
Closing the issue as it has been resolved.
I am trying to parallelize ingestion of multiple, locally-stored PDFs, in my vectorstore.
when trying to load multiple documents with joblib, get error cannot pickle
PicklingError: Could not pickle the task to send it to the workers.
is this because of the API call involving accessing an external server for every PDF I am loading with llmsherpa? What would be a workaround for this? Making this async (if yes, how)?
I think this is important for production.
thank you