nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.37k stars 134 forks source link

Bug in load_data when using full path #60

Open yoeldk opened 6 months ago

yoeldk commented 6 months ago

This code would fail:

full_path = 'C:\\temp\\A\\test.pdf'
documents = pdf_loader.load_data(full_path )

However, if relative path is given it works fine.

It looks like the issue is in file_reader.py:63 is_url = urlparse(path_or_url).scheme != ""

In case of full path the scheme will be the letter of the drive (C in this case) which would make it treat it as a URL instead of a path.

STageAmp commented 6 months ago

I am facing the same problem, did you find any workaround ?

parvpareek commented 2 months ago

you could just change the code and make it:

        is_url = urlparse(path_or_url).scheme != "" &&  len(urlparse(path_or_url).scheme) > 2