run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.88k stars 170 forks source link

Error while parsing the PDF file: Failed to parse the PDF file #61

Closed valeriodipalo closed 6 days ago

valeriodipalo commented 4 months ago

While working in the online preview, I get this error when running it on Jupyter notebook: Error while parsing the PDF file: Failed to parse the PDF file: {"detail":[{"loc":["body","language",0],"msg":"value is not a valid enumeration member; permitted: 'af', 'az', 'bs', 'cs', 'cy', 'da', 'de', 'en', 'es', 'et', 'fr', 'ga', 'hr', 'hu', 'id', 'is', 'it', 'ku', 'la', 'lt', 'lv', 'mi', 'ms', 'mt', 'nl', 'no', 'oc', 'pi', 'pl', 'pt', 'ro', 'rs_latin', 'sk', 'sl', 'sq', 'sv', 'sw', 'tl', 'tr', 'uz', 'vi', 'ar', 'fa', 'ug', 'ur', 'bn', 'as', 'mni', 'ru', 'rs_cyrillic', 'be', 'bg', 'uk', 'mn', 'abq', 'ady', 'kbd', 'ava', 'dar', 'inh', 'che', 'lbe', 'lez', 'tab', 'tjk', 'hi', 'mr', 'ne', 'bh', 'mai', 'ang', 'bho', 'mah', 'sck', 'new', 'gom', 'sa', 'bgc', 'th', 'ch_sim', 'ch_tra', 'ja', 'ko', 'ta', 'te', 'kn'","type":"type_error.enum","ctx":{"enum_values":["af","az","bs","cs","cy","da","de","en","es","et","fr","ga","hr","hu","id","is","it","ku","la","lt","lv","mi","ms","mt","nl","no","oc","pi","pl","pt","ro","rs_latin","sk","sl","sq","sv","sw","tl","tr","uz","vi","ar","fa","ug","ur","bn","as","mni","ru","rs_cyrillic","be","bg","uk","mn","abq","ady","kbd","ava","dar","inh","che","lbe","lez","tab","tjk","hi","mr","ne","bh","mai","ang","bho","mah","sck","new","gom","sa","bgc","th","ch_sim","ch_tra","ja","ko","ta","te","kn"]}}]}

hexapode commented 4 months ago

This was just fix by https://github.com/run-llama/llama_parse/pull/60

Can you try to update your llama_parse package?

Thanks for reporting!

httplups commented 3 months ago

Try to set up the language when creating the LLamaParse object

abhibarman commented 3 months ago

I was getting same error for the below code ..

from llama_parse import LlamaParse pdf_file_name = './insurance.pdf' documents = LlamaParse(result_type="markdown").load_data(pdf_file_name)

Below changes fixed the isssue:

from llama_parse import LlamaParse from llama_parse.base import ResultType, Language pdf_file_name = './insurance.pdf'

documents = LlamaParse(result_type=ResultType.MD,language=Language.ENGLISH).load_data(pdf_file_name)

BinaryBrain commented 6 days ago

This was probably fixed. I just tried to run:

from llama_parse import LlamaParse
pdf_file_name = './insurance.pdf'
documents = LlamaParse(result_type="markdown").load_data(pdf_file_name)
print(documents)

and got a result. Please repoen if it's still not working on your side.

nikky78 commented 4 days ago

I get the same error with the version 0.4.6

BinaryBrain commented 4 days ago

Hi @nikky78 Can you provide your code so I can reproduce the issue?