Closed sangee2004 closed 4 months ago
I get this with every single PDF Text extractor I have been testing so far, including the commercial offering of MuPDF. No idea if we'll be able to solve this anytime soon. Did you find any other PDFs apart from this one that yielded this error? It appears to be a structural error in the file itself.
I have not encountered this error with other PDF files I have tested so far. But I was able to ingest this PDF file successfully previosuly when testing with python version of the tool from. - https://github.com/gptscript-ai/knowledge-retrieval-api
Fixed in HEAD (top of main branch)
Tested with latest knowledge.
Able to ingest and retrieve information successfully from the PDF file attached in this issue.
Steps to reproduce the problem:
ingestion fails with "def of non-name" error