run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
2.74k stars 263 forks source link

Unable to parse PDF from directory #268

Open beratkml opened 3 months ago

beratkml commented 3 months ago

Everytime I try to run my code it gives an error that the directory doesnt exist:

parser = LlamaParse(api_key=os.getenv("LLAMA_CLOUD_API_KEY"),result_type="markdown")

file_extractor = {".pdf": parser}
reader = SimpleDirectoryReader('./data', file_extractor=file_extractor,recursive=True).load_data()

The error I receive: Error while parsing the file '<bytes/buffer>': file_input must be either a file path string, file bytes, or buffer object

This is my file structure image

Endairion commented 3 months ago

Met the same problem as well

tistu37 commented 3 months ago

I temporarily solved it by making the changes in the base.py file from the 'fix file_input type issue' pull request. image

(The change is in line 166)

mingjun1120 commented 2 months ago

I temporarily solved it by making the changes in the base.py file from the 'fix file_input type issue' pull request. image

(The change is in line 166)

It works!!! Thanks.