issues
search
pymupdf
/
RAG
RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
302
stars
57
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
A title with various font sizes
#159
Fianax
closed
18 hours ago
5
Very long titles when converting to markdown
#158
Fianax
opened
1 day ago
8
Some pdf pages takes lot of time to converting.
#157
imran-pyflow
closed
2 days ago
2
Page subtitle located near a table is detected as part of a table
#156
nmakhotkin
opened
4 days ago
0
Inconsistent image extraction from image-only PDFs
#155
Cozokim
opened
4 days ago
2
No text for some pages of a pdf file
#154
nmakhotkin
opened
1 week ago
0
Embedded hyperlink doesn't get extracted in markdown mode
#153
tkcoding
closed
1 week ago
1
Some images are missing with new version
#152
Cozokim
closed
1 week ago
1
pymupdf4llm markdown function missing first and last line on every page
#151
Devvarat
closed
1 week ago
4
0.0.17 seems to output no text
#150
dentro-innovation
closed
1 week ago
7
Changes for version 0.0.17
#149
JorjMcKie
closed
1 week ago
0
Some images are wrongly extracted
#148
drdsgvo
closed
1 week ago
4
Error when page contains nothing but a table
#147
simonschoe
closed
1 week ago
4
Stuck for multiple panel text PDFs
#146
bbfrog
closed
2 weeks ago
2
Page header and footer detection is wrong
#145
bbfrog
closed
2 weeks ago
2
Multiple lines parsed as single line
#144
tanchangsheng
closed
2 weeks ago
3
Normal body text parsed as headers
#143
tanchangsheng
closed
2 weeks ago
2
Removes the documentation folder and moves changes to top level MD file.
#142
jamie-lemon
closed
2 weeks ago
0
Address Issue 140
#141
JorjMcKie
closed
2 weeks ago
0
version v.015 missing pix.save(image_filename)
#140
kingennio
closed
2 weeks ago
2
Version 0.0.15
#139
JorjMcKie
closed
2 weeks ago
0
Table is not extracted and some text order was wrong for this PDF
#138
bbfrog
closed
2 weeks ago
4
Bounding boxes for extracted text
#136
simonschoe
closed
2 weeks ago
2
Problem with multiple columns in simple text
#135
pascucg
closed
2 weeks ago
5
Exclude images based on size threshold parameter
#134
kingennio
closed
2 weeks ago
1
Optionally embed images as base64 string
#132
jason-technology
closed
2 weeks ago
3
When write_images=True, the resulting Markdown text doesn't have references to the images.
#131
jason-technology
closed
2 weeks ago
1
Useless variable graphics = [] in pymupdf_rag.py
#130
CedricLor
closed
2 weeks ago
1
Code logic duplicate
#129
CedricLor
closed
1 week ago
2
Enhanced image embedding format
#128
zkn365
closed
2 weeks ago
2
Enhanced handling of line breaks in pdf
#127
zkn365
opened
3 weeks ago
0
Updates for v0.0.14
#126
JorjMcKie
closed
3 weeks ago
0
Error when executing the method 'pdf4llm.to_markdown()' in python.
#125
Fianax
closed
3 weeks ago
1
Version 0.0.13
#123
JorjMcKie
closed
1 month ago
0
Add show_progress option to to_markdown()
#122
zane-programs
closed
3 weeks ago
11
PyMUPDF4llm to_markdown() parallel execution
#121
lequan310
closed
1 month ago
1
Write Images not working
#120
neilbhutada
closed
1 month ago
1
`to_markdown` should accept bytes
#119
MrPupik
closed
1 month ago
1
Pymupdf4llm returns garbage values during parsing a simple page.
#118
AhsanAli1116
closed
1 month ago
6
Handling Graphical Images & Superscripts
#116
SBhat2615
opened
1 month ago
7
Parsing complete scanned document
#115
SBhat2615
closed
1 month ago
2
pymupdf4llm worse than pymypdf on multi-column case. pymupdf4llm merges columns alone sentences.
#113
pseudotensor
closed
1 week ago
4
While trying to use pymupdfllm to save an image, I get the following error: Invalid bandwriter header dimensions/setup
#112
neilbhutada
closed
1 month ago
6
Ensure bullets are a tuple
#111
JorjMcKie
closed
1 month ago
0
TypeError: startswith first arg must be str or a tuple of str, not list error in version 0.0.11
#110
Sooppin
closed
1 month ago
3
Fix tuple/list issue created by recent commit
#109
zane-programs
closed
1 month ago
1
Some fixes
#108
JorjMcKie
closed
1 month ago
0
pymypdf4llm的一些问题,大家有没有遇到
#106
ghost
closed
1 month ago
1
Text Overlooked Due to Watermark Detection in PDFs
#105
Buckler89
closed
1 month ago
3
How to remove strike through texts?
#104
Alphastream-Admin
closed
1 month ago
1
Next