Workaround about diagram recognizing

pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF

https://pymupdf.readthedocs.io/en/latest/pymupdf4llm

GNU Affero General Public License v3.0

302 stars 57 forks source link

Workaround about diagram recognizing #47

Closed dantetemplar closed 3 months ago

dantetemplar commented 3 months ago

I encountered a problem processing diagrams while reading presentations at my university.

Can you tell me where in the code I could intercept the diagram processing? I would like to add a fallback to a computer vision model so that the model's response is inserted instead of the diagram block.

By diagram I mean something like this:

So far I got complete mess:

#### Half Subtractor and Half Adder comparison

𝑎
!
𝑏 !

𝑎
!
𝑏 !

𝑑
!

𝑠
!

XOR

XOR

NOT
𝑏out

AND

NOT
𝑏out

Half Subtractor Circuit:
-  No borrow-in from outside

-  The Subtraction of Least Significant Bit

𝑐out
AND

Half Adder Circuit:
-  No carry-in from outside

-  The Summation of Least Significant Bit

-----

JorjMcKie commented 3 months ago

Can you let me have an example page please? Or try parameter "write_images=True". This option should spare out vector graphics and images from text extraction, write the respective areas out to image files (PNG) and insert Markdown references to these images in the produced text.

dantetemplar commented 3 months ago

Can you let me have an example page please? Or try parameter "write_images=True". This option should spare out vector graphics and images from text extraction, write the respective areas out to image files (PNG) and insert Markdown references to these images in the produced text.

Now, I do it that way: find all areas with images and non-table graphics clusters, skip text processing in these areas; make "screenshot" (create pixmap) for these areas and pass it to tesseract ocr. And also insert Markdown references.

dantetemplar commented 3 months ago

We can close the issue actually. I will send my code snippets later

dantetemplar commented 3 months ago

https://github.com/dantetemplar/pymupdf4llm