nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.42k stars 140 forks source link

Receiving 'urllib3.exceptions.LocationValueError: No host specified.' even though I have a local server up and running #84

Open anirudh-gapblue opened 5 months ago

anirudh-gapblue commented 5 months ago

While trying to use the locally hosted nlm-ingestor API, I am receiving this error urllib3.exceptions.LocationValueError: No host specified.

In 3 command prompts, I have java -jar tika-server-standard-nlm-modified-2.4.1_v6.jar, python -m nlm_ingestor.ingestion_daemon, docker run -p 5010:5001 ghcr.io/nlmatics/nlm-ingestor:latest running in each of them.

I can tell all of them are working perfectly since running 'curl -I HTTP://localhost:port' returns a response 200.

But when I run

from llmsherpa.readers import LayoutPDFReader
llmsherpa_api_url = "http://localhost:5010/api/parseDocument?renderFormat=all&useNewIndentParser=true"
pdf_url = "C:\\Apple.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_url)

for chunk in doc.chunks():
    text=chunk.to_context_text()
    print(text)

I receive urllib3.exceptions.LocationValueError: No host specified.

Please help!

moshesbeta commented 4 months ago

I have encountered the same error when replacing the pdf_url with the local file path. Have you resolve the problem?

Raul824 commented 4 months ago

This issue is due to windows path being parsed as url instead of local file.

from urllib.parse import urlparse
print(urlparse(pdf_url).scheme) ## This prints C.

Solution

import os
os.chdir("C:/")
pdf_url = "Apple.pdf"