pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
539 stars 82 forks source link

TypeError: startswith first arg must be str or a tuple of str, not list error in version 0.0.11 #110

Closed Sooppin closed 2 months ago

Sooppin commented 2 months ago

Hello,

When using the 'to_markdown(path)' function in version 0.0.11, I receive the following error : "TypeError: startswith first arg must be str or a tuple of str, not list". Interestingly, the same code works perfectly fine in version 0.0.10, which suggests that the problem is specific to the latest release. I would appreciate it if you could provide a solution or suggest any fixes for this issue.

Thank you!

neatree commented 2 months ago

I’m experiencing the same issue, especially when using the pages parameter.

kriscelmer commented 2 months ago

Same issue.

In file pymupdf4llm/pymupdf4llm/helpers/pymupdf_rag.py, in line 454:

or span0["text"].startswith(bullet)

'bullet' is defined in line 43 as list, while startswith expects tupple. The same file in v0.0.10 has 'bullet' defined as tupple.

JorjMcKie commented 2 months ago

Solved with version 0.0.12, published just now.