nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.1k stars 158 forks source link

Receiving an error 'urllib3.exceptions.LocationValueError: No host specified.' #61

Open anirudh-gapblue opened 6 months ago

anirudh-gapblue commented 6 months ago

While trying to run

llmsherpa_api_url = "http://localhost:5010/api/parseDocument?renderFormat=all&useNewIndentParser=true"
pdf_url = "<path_to_file>//Apple.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
# pdf_url = "https://arxiv.org/pdf/2212.14024.pdf"
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_url)

for chunk in doc.chunks():
    text=chunk.to_context_text()
    print(text)

I received File "\venv\lib\site-packages\urllib3\poolmanager.py", line 236, in connection_from_host raise LocationValueError("No host specified.") urllib3.exceptions.LocationValueError: No host specified.

I have 3 terminals active with java -jar tika-server-standard-nlm-modified-2.4.1_v6.jar, python -m lm_ingestor.ingestion_daemon, docker run -p 5010:5001 ghcr.io/nlmatics/nlm-ingestor:latest. All of them are working fine since curl I HTTP://localhost:<port> returns a response of 200

Please Help!!

python version: 3.9.13 urllib version: 1.26.6