issues
search
pymupdf
/
RAG
RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
518
stars
81
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Ways to identify Heading, subheadings and child heading
#181
Shubhamkumar782
closed
5 days ago
0
Extract Table Content Only for Each Page to Store as Metadata
#179
QuangTQV
closed
5 days ago
3
Performance issue with simple PDFs
#178
MrCodingCoderCoding
opened
1 week ago
0
Fixes some code samples in the READMEs.
#177
jamie-lemon
closed
2 weeks ago
0
Feature request: inlined base64 images in markdown format
#176
sglebs
closed
1 week ago
1
Ligatures are not properly handled converting to MD
#175
DiazBejaranoD
closed
3 weeks ago
3
Fixed MD links attached to the right span instead of the whole sentence
#174
DiazBejaranoD
opened
3 weeks ago
0
First column of table is repeated before the actual table
#173
johnmara-pc14
closed
2 weeks ago
3
feat: added parameter `textflags` to `to_markdown` method
#172
Programmer420-1
opened
3 weeks ago
0
Text rects overlap with tables and images that should be excluded
#171
Meaveryway
opened
3 weeks ago
5
Pprados/fix password
#170
pprados
closed
2 weeks ago
3
AttributeError: partially initialized module 'pymupdf4llm' has no attribute 'to_markdown' (most likely due to a circular import)
#169
majestichou
closed
1 week ago
3
123
#167
dintou
closed
1 month ago
0
Update pdf_markdown_reader.py
#166
levente-murgas
closed
1 month ago
1
Titles that do not convert to markdown titles
#165
Fianax
closed
1 month ago
1
related to the closed issue of annotation/drawings
#164
kingennio
opened
1 month ago
3
image extraction broken in 0.17, worked on 0.16
#163
kingennio
opened
1 month ago
5
to_markdown isn't outputting all the pages but get_text is
#162
martyphee
closed
1 month ago
2
force_text param ignored
#161
kingennio
closed
1 month ago
10
Superscript texts are not handled properly within tables
#160
argocan
closed
1 month ago
3
A title with various font sizes
#159
Fianax
closed
1 month ago
5
Very long titles when converting to markdown
#158
Fianax
opened
1 month ago
8
Some pdf pages takes lot of time to converting.
#157
imran-pyflow
closed
1 month ago
2
Page subtitle located near a table is detected as part of a table
#156
nmakhotkin
opened
1 month ago
1
Inconsistent image extraction from image-only PDFs
#155
Cozokim
opened
1 month ago
3
No text for some pages of a pdf file
#154
nmakhotkin
opened
1 month ago
1
Embedded hyperlink doesn't get extracted in markdown mode
#153
tkcoding
closed
1 month ago
1
Some images are missing with new version
#152
Cozokim
closed
1 month ago
1
pymupdf4llm markdown function missing first and last line on every page
#151
Devvarat
closed
1 month ago
4
0.0.17 seems to output no text
#150
dentro-innovation
closed
1 month ago
7
Changes for version 0.0.17
#149
JorjMcKie
closed
1 month ago
0
Some images are wrongly extracted
#148
drdsgvo
closed
1 month ago
4
Error when page contains nothing but a table
#147
simonschoe
closed
1 month ago
4
Stuck for multiple panel text PDFs
#146
bbfrog
closed
1 month ago
2
Page header and footer detection is wrong
#145
bbfrog
closed
1 month ago
2
Multiple lines parsed as single line
#144
tanchangsheng
closed
1 month ago
3
Normal body text parsed as headers
#143
tanchangsheng
closed
1 month ago
2
Removes the documentation folder and moves changes to top level MD file.
#142
jamie-lemon
closed
2 months ago
0
Address Issue 140
#141
JorjMcKie
closed
2 months ago
0
version v.015 missing pix.save(image_filename)
#140
kingennio
closed
2 months ago
2
Version 0.0.15
#139
JorjMcKie
closed
2 months ago
0
Table is not extracted and some text order was wrong for this PDF
#138
bbfrog
closed
2 months ago
4
Bounding boxes for extracted text
#136
simonschoe
closed
2 months ago
2
Problem with multiple columns in simple text
#135
pascucg
closed
2 months ago
5
Exclude images based on size threshold parameter
#134
kingennio
closed
2 months ago
1
Optionally embed images as base64 string
#132
jason-technology
closed
2 months ago
3
When write_images=True, the resulting Markdown text doesn't have references to the images.
#131
jason-technology
closed
2 months ago
1
Useless variable graphics = [] in pymupdf_rag.py
#130
CedricLor
closed
2 months ago
1
Code logic duplicate
#129
CedricLor
closed
1 month ago
2
Enhanced image embedding format
#128
zkn365
closed
2 months ago
2
Next