Open Ianpwest opened 3 months ago
Hey, @Ianpwest did you manage to solve this?
Hey, @Ianpwest did you manage to solve this?
@wolfassi123 No, there were also some other parsing issues with different character sets. The library is promising but seemingly under supported. No movement on my tickets.
Hello @Ianpwest, @wolfassi123, I have fixed the issue and seems to be working with the sample PDF provided here. Can you do a pull from the main branch of nlm-ingestor and verify?
PDF in question: JTR.pdf
This api call works great
llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all" pdf_url = "JTR.pdf" pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)
This local call fails
llmsherpa_api_url = "[http://localhost:5010/api/parseDocument?renderFormat=all"](http://localhost:5010/api/parseDocument?renderFormat=all%22) pdf_url = "JTR.pdf" pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)
The local version is running from the latest docker build. Other pdfs work fine. Is there a way to get a better error message? Currently receiving: KeyError: 'return_dict'
I noticed there are other issues open around this error but did not find any matching this case where it works on one and not the other.
I appreciate your time and any insight. Thanks!