rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE
Apache License 2.0
375 stars 20 forks source link

Reading PDF files #25

Open andy8025 opened 1 day ago

andy8025 commented 1 day ago

Hi, your blog (in the section titled Long Multimodal Input - Paper Reading) shows an example of Aria answering a question based on a PDF file. https://www.rhymes.ai/blog-details/aria-first-open-multimodal-native-moe-model However, when I feed the same PDF file into Aria running locally it outputs "PIL.UnidentifiedImageError: cannot identify image file 'test.pdf'" Does Aria natively support PDF files as input? Am I just not using the right Python library (PIL)? Thanks.

aria-hacker commented 1 day ago

No, the raw inputs for Aria model support image and text only. For the PDF files, you have to preprocess it with PyMUPDF library. Here are some more details about how to process PDF files. @andy8025