pymupdf RAG issues - Githubissues

pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF

https://pymupdf.readthedocs.io/en/latest/pymupdf4llm

GNU Affero General Public License v3.0

518 stars 81 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Ways to identify Heading, subheadings and child heading

#181 Shubhamkumar782 closed 5 days ago
0
Extract Table Content Only for Each Page to Store as Metadata

#179 QuangTQV closed 5 days ago
3
Performance issue with simple PDFs

#178 MrCodingCoderCoding opened 1 week ago
0
Fixes some code samples in the READMEs.

#177 jamie-lemon closed 2 weeks ago
0
Feature request: inlined base64 images in markdown format

#176 sglebs closed 1 week ago
1
Ligatures are not properly handled converting to MD

#175 DiazBejaranoD closed 3 weeks ago
3
Fixed MD links attached to the right span instead of the whole sentence

#174 DiazBejaranoD opened 3 weeks ago
0
First column of table is repeated before the actual table

#173 johnmara-pc14 closed 2 weeks ago
3
feat: added parameter `textflags` to `to_markdown` method

#172 Programmer420-1 opened 3 weeks ago
0
Text rects overlap with tables and images that should be excluded

#171 Meaveryway opened 3 weeks ago
5
Pprados/fix password

#170 pprados closed 2 weeks ago
3
AttributeError: partially initialized module 'pymupdf4llm' has no attribute 'to_markdown' (most likely due to a circular import)

#169 majestichou closed 1 week ago
3
123

#167 dintou closed 1 month ago
0
Update pdf_markdown_reader.py

#166 levente-murgas closed 1 month ago
1
Titles that do not convert to markdown titles

#165 Fianax closed 1 month ago
1
related to the closed issue of annotation/drawings

#164 kingennio opened 1 month ago
3
image extraction broken in 0.17, worked on 0.16

#163 kingennio opened 1 month ago
5
to_markdown isn't outputting all the pages but get_text is

#162 martyphee closed 1 month ago
2
force_text param ignored

#161 kingennio closed 1 month ago
10
Superscript texts are not handled properly within tables

#160 argocan closed 1 month ago
3
A title with various font sizes

#159 Fianax closed 1 month ago
5
Very long titles when converting to markdown

#158 Fianax opened 1 month ago
8
Some pdf pages takes lot of time to converting.

#157 imran-pyflow closed 1 month ago
2
Page subtitle located near a table is detected as part of a table

#156 nmakhotkin opened 1 month ago
1
Inconsistent image extraction from image-only PDFs

#155 Cozokim opened 1 month ago
3
No text for some pages of a pdf file

#154 nmakhotkin opened 1 month ago
1
Embedded hyperlink doesn't get extracted in markdown mode

#153 tkcoding closed 1 month ago
1
Some images are missing with new version

#152 Cozokim closed 1 month ago
1
pymupdf4llm markdown function missing first and last line on every page

#151 Devvarat closed 1 month ago
4
0.0.17 seems to output no text

#150 dentro-innovation closed 1 month ago
7
Changes for version 0.0.17

#149 JorjMcKie closed 1 month ago
0
Some images are wrongly extracted

#148 drdsgvo closed 1 month ago
4
Error when page contains nothing but a table

#147 simonschoe closed 1 month ago
4
Stuck for multiple panel text PDFs

#146 bbfrog closed 1 month ago
2
Page header and footer detection is wrong

#145 bbfrog closed 1 month ago
2
Multiple lines parsed as single line

#144 tanchangsheng closed 1 month ago
3
Normal body text parsed as headers

#143 tanchangsheng closed 1 month ago
2
Removes the documentation folder and moves changes to top level MD file.

#142 jamie-lemon closed 2 months ago
0
Address Issue 140

#141 JorjMcKie closed 2 months ago
0
version v.015 missing pix.save(image_filename)

#140 kingennio closed 2 months ago
2
Version 0.0.15

#139 JorjMcKie closed 2 months ago
0
Table is not extracted and some text order was wrong for this PDF

#138 bbfrog closed 2 months ago
4
Bounding boxes for extracted text

#136 simonschoe closed 2 months ago
2
Problem with multiple columns in simple text

#135 pascucg closed 2 months ago
5
Exclude images based on size threshold parameter

#134 kingennio closed 2 months ago
1
Optionally embed images as base64 string

#132 jason-technology closed 2 months ago
3
When write_images=True, the resulting Markdown text doesn't have references to the images.

#131 jason-technology closed 2 months ago
1
Useless variable graphics = [] in pymupdf_rag.py

#130 CedricLor closed 2 months ago
1
Code logic duplicate

#129 CedricLor closed 1 month ago
2
Enhanced image embedding format

#128 zkn365 closed 2 months ago
2