nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
971 stars 124 forks source link

API url issues #22

Open drewskidang opened 5 months ago

drewskidang commented 5 months ago

I'm having trouble using the custom url. When i use the example given it works fine but when using my own sever i get this issue json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (.conda) (base) root@LangLlama:~/wsl_projects/titan/nlm-ingestor# /root/wsl_projects/titan/nlm-ingestor/.conda/bin/python /root/wsl_projects/titan/nlm-ingestor/customrag.py Traceback (most recent call last): File "/root/wsl_projects/titan/nlm-ingestor/customrag.py", line 47, in process_pdfs(pdf_directory) File "/root/wsl_projects/titan/nlm-ingestor/customrag.py", line 17, in process_pdfs docs = pdf_reader.read_pdf(pdf_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/wsl_projects/titan/nlm-ingestor/.conda/lib/python3.11/site-packages/llmsherpa/readers/file_reader.py", line 73, in read_pdf blocks = response_json['return_dict']['result']['blocks']


KeyError: 'result'
(.conda) (base) root@LangLlama:~/wsl_projects/titan/nlm-ingestor# 
jpbalarini commented 5 months ago

Having the same issue on some files

opiethehokie commented 5 months ago

The llmsherpa code doesn't seem to handle nlm-ingestor errors well, so I think you'll see this error any time reading a PDF fails. You need to look at the Python server code, from run.sh the output of python -m nlm_ingestor.ingestion_daemon to see the specific nlm-ingestor error.

Alphastream-Admin commented 5 months ago

any update on this issue?

drewskidang commented 5 months ago

@jpbalarini i ran the docker instead and had no issues :)