nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.1k stars 158 forks source link

JSON Decode error when #84

Open shumin018 opened 3 months ago

shumin018 commented 3 months ago

Hello, i've followed the instructions to host llmsherpa on my own resources, but when im trying to access this via an external URL, im getting a json decode error

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

any help please? thanks!

code -

from llmsherpa.readers import LayoutPDFReader
llmsherpa_api_url = 'http://myurl.com/api/parseDocument?renderFormat=all&useNewIndentParser=yes'
pdf_url = "https://abc.xyz/assets/91/b3/3f9213d14ce3ae27e1038e01a0e0/2024q1-alphabet-earnings-release-pdf.pdf"

pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_url)
shubhampatwa commented 3 months ago

Check for http://myurl.com nlm-ingestion server, if is working or not.

JSONDecodeError: Expecting value: line 1 column 1 (char 0) arises when server is not working

emilyweber35 commented 3 months ago

I have gotten the same error. Myurl says service is running.

Quang-elec44 commented 2 months ago

I'm having the same issue too. Here is my code

from llmsherpa.readers import LayoutPDFReader

llmsherpa_api_url = "http://localhost:5010/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "myfile.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_url)

In the container, here is the log:

 * Serving Flask app '__main__'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5001
 * Running on http://172.17.0.3:5001
Press CTRL+C to quit
172.17.0.1 - - [10/Sep/2024 04:11:06] "POST /api/document/developer/parseDocument?renderFormat=all HTTP/1.1" 404 -