Closed namp closed 11 months ago
I (kind of) answered my own question with a quick and dirty code injection in ingest_service.py
If anyone's interesting feel free to contact me.
Thanks
Hello
i could be interessted ! I would like to change PDF conversion from pypdf to something like ChatDOC PDF Parser. It would enable me to have more context into unstructured data like PDF
@namp Would love any insight on how you were able to switch the PDF loader. Also encountering the same issue with bulk PDF imports
Thanks,
I'm getting a lot of errors when parsing pdf files with the new version of privateGPT, for example:
pypdf/_cmap.py", line 369, in parse_bfrange ] = unhexlify(fmt2 % c).decode("utf-16-be", "surrogatepass") ^^^^^^^^^^^^^^^^^^^ binascii.error: odd-length string
The primordial version utilized pyMUpdf which parsed my pdf files without issues.
Is there any way to set PyMuPDFLoader as default loader for ingesting pdf files?
Thanks