Closed Jaimish00 closed 1 week ago
Thanks for the feedback!
Please always provide an example PDF page for problem reproduction. In your specific situation you might want to suggest additional bullet point characters to add to that list.
Sure, I've attached it. I just created this simple doc using Google Docs to try.
Moreover, I have been using this package for more complex cases that includes parsing different kinds of PDFs of Documentation, and Wiki pages, and there we might have other types of bullet points, so at that moment this small number of bullet lists might not be sufficient
Hey @JorjMcKie
Any updates on this?
Yes - there is no current support for multi-level bullet points. This will not be implemented any time soon either.
The more basic issue (of not recognized bullets) is still under investigation. I am currently out of town, so bear with me for at least another week or maybe two.
Fixed in v0.0.17.
Hello there,
First of all thanks for this amazing library 🙌
I am facing some issues with the bullet points in the generated markdown. I have tried several different kinds of bullet points to test if the markdown contains the bullet points and indented bullet points.
For example, I just created a Document file with some bullet points, which you can see below
Now I exported this doc as a PDF and tried running
to_markdown
on this, and as a result, I got this as an outputThere are few observations that I have made looking at this output,
\n
but not\t
-
before the text, and looking at the codebase I saw that there is thisbullet
list that is getting compared with https://github.com/pymupdf/RAG/blob/8c0f5009f3d121a9679445b7b551318d77dd967c/pymupdf4llm/pymupdf4llm/helpers/pymupdf_rag.py#L43-L51But digging a bit deeper in the code, I found that sometimes the bullet points are not even getting parsed in the text, to check against this bullet list https://github.com/pymupdf/RAG/blob/8c0f5009f3d121a9679445b7b551318d77dd967c/pymupdf4llm/pymupdf4llm/helpers/pymupdf_rag.py#L505-L506
Can anyone help me with this?