ray-project / langchain-ray

Examples on how to use LangChain and Ray
Apache License 2.0
217 stars 47 forks source link

embed pdf error #11

Closed colorzhang closed 1 year ago

colorzhang commented 1 year ago

(ReadBinary->FlatMap->FlatMap pid=17023, ip=172.31.69.122) OSError: When reading information for key '2205.13708v1.HiJoNLP_at_SemEval_2022_Task_2_Detecting_Idiomaticity_of_Multiword_Expressions_using_Multilingual_Pretrained_Language_Models.pdf' in bucket 'ray-llm-batch-inference': AWS Error [code 15]: No response body. [repeated 157x across cluster] (ReadBinary->FlatMap->FlatMap pid=27798, ip=172.31.75.31) invalid pdf header: b'<?xml' (ReadBinary->FlatMap->FlatMap pid=27798, ip=172.31.75.31) EOF marker not found

kamil-kaczmarek commented 1 year ago

HI @colorzhang looks like you have invalid pdf file.

colorzhang commented 1 year ago

Thanks, solved by downloading pdfs to my bucket.