pymupdf RAG issues - Githubissues

pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF

https://pymupdf.readthedocs.io/en/latest/pymupdf4llm

GNU Affero General Public License v3.0

302 stars 57 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Improve types for to_markdown function

#46 dantetemplar closed 3 months ago
1
Accept Path object in to_markdown function

#45 dantetemplar closed 3 months ago
10
Instead of file path in string would also like to pass fitz object

#44 narsandu closed 3 months ago
2
Draft: updates logo.

#43 jamie-lemon closed 3 months ago
0
Embedded links inside the table are not extracted

#42 narsandu opened 3 months ago
1
Unable to parse 2-column documents

#41 rahul-dhir-0047 closed 3 months ago
4
Unable to parse double column pdf

#40 smallzhao closed 3 months ago
2
typing errors

#39 fareshan closed 3 months ago
2
Error when loading pdf using python BytesIO: object has no attribute 'page_count'

#38 danmb1979 closed 3 months ago
1
Moves examples into dedicated folder with README.

#35 jamie-lemon closed 4 months ago
0
Suggestion: possibility to have a callback on table

#33 papipsycho closed 3 months ago
1
Integration into langchain

#32 zymbuzz closed 4 months ago
1
adding dpi setting and fixing remaining image output

#31 jakubkovac closed 3 months ago
0
issue with Heading since version 0.0.5

#30 papipsycho closed 3 months ago
5
Bug in pymupdf_rag

#29 SergioG-M closed 4 months ago
1
Update image inclusion

#28 JorjMcKie closed 4 months ago
0
Update pymupdf_rag.py to fix compatibility issue with Python 3.9

#27 shenyimings closed 4 months ago
1
Remove headers & Footers

#26 G-Slient closed 4 months ago
1
Update api.rst

#25 JorjMcKie closed 4 months ago
0
Adds more info to homepage for latest version and starts API doc.

#24 jamie-lemon closed 4 months ago
2
Many text block became image since the version 0.0.3

#23 papipsycho closed 4 months ago
13
No attribute to_markdown

#22 aman-vink closed 4 months ago
4
Images in table

#21 chillyoung4679 opened 4 months ago
1
[Llama] Fixed Async load

#20 YanSte closed 4 months ago
0
Table text without new lines

#19 papipsycho closed 3 months ago
3
Image background can cause text extraction to fail

#18 maxjeblick closed 4 months ago
3
'pymupdf4llm' has no attribute 'to_markdown'

#17 zzzcccxx closed 4 months ago
2
add expected type and some typings

#16 fareshan closed 4 months ago
1
[Llama] Improved int page structure and naming

#13 YanSte closed 4 months ago
2
[Hotfix] Fixed import and method used for version 0.0.2 (Import issue)

#12 YanSte closed 4 months ago
10
No module named 'get_text_lines' in pymupdf4llm

#15 vzegna closed 4 months ago
4
Preserving page information when creating markdown file from pdf

#11 SamGalanakis closed 4 months ago
2
improve typings to accept range and None

#10 fareshan closed 4 months ago
2
Update logic to set same_line flag based on case of first character

#9 tamdao closed 4 months ago
0
to_markdown() for two-column pdf

#8 MahtabF closed 4 months ago
3
[PDFMardownReader] LlamaIndex Reader

#7 YanSte closed 4 months ago
8
Ignore the header and footer of PDF

#6 difonjohaiv closed 4 months ago
2
Avoid add header to an empty line

#5 Louis-udm closed 3 months ago
2
[Suggestion] PDFReader with LlamaIndex BaseReader and insertion in Llama Hub

#4 YanSte closed 4 months ago
4
Not all PDFs have fontsizes

#3 bartdegoede closed 5 months ago
1
Generation Speed

#2 fareshan closed 5 months ago
2
Specified encoding when opening to fix undefined characters

#1 dragoa closed 5 months ago
1