pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
392 stars 68 forks source link

Error when executing the method 'pdf4llm.to_markdown()' in python. #125

Closed Fianax closed 1 month ago

Fianax commented 2 months ago

I am using pdf4llm to transform a pdf to markdown but when I use the method 'to_markdown' and pass the pdf path I get this error 'not a textpage of this page'.

This would be the code that I execute:

md_text = ''
    try:
        md_text = pdf4llm.to_markdown(path_pdf)
        if(md_text != ''):
            pathlib.Path(path_md).write_bytes(md_text.encode())
    except Exception as e:
        print(f'Error, to_markdown => ${path_pdf} Error => {e}')

    return md_text

This code is inside a method that is executed when I make a call to an api (with fast-api python) that I have running locally.

When I have executed it without api or anything it has not failed me but it has been to put it in an execution of an endpoint and when several users call to that endpoint it fails me in some of the pdf, not in all of them.

Any idea why this might be or what the error 'not a textpage of this page' means?

JorjMcKie commented 1 month ago

PyMuPDF4llM, like PyMuPDF cannot be used in an environment using Python threads see here.