absolute path Issue in read_pdf method

nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects

https://www.nlmatics.com

MIT License

1.15k stars 113 forks source link

absolute path Issue in read_pdf method #88

Open Nishant-Bansal-777 opened 1 month ago

Nishant-Bansal-777 commented 1 month ago

Hey, I am using llmsherpa to parse pdf's. I have noticed that if I provide full path of pdf then there is a value for "is_url" variable and it starts downloading the pdf instead of going to else block. is_url = urlparse(path_or_url).scheme != "" (in file_reader.py line 63) for eg: pdf_path = 'E:\all_projects\pdf\first.pdf' (is_url value is 'e') To avoid this i am keeping pdf's in project directory.

MeghaWalia-eco commented 3 weeks ago

Do we have any solution for this yet.