issues
search
metachris
/
pdfx
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.05k
stars
115
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
added sorting option by using list
#59
Masterjx9
opened
1 year ago
0
Fixed handling of multi-line link extraction
#58
maximiliancw
opened
1 year ago
0
Create SECURITY.xml
#57
1989shack
opened
2 years ago
0
Point pdf links to local files downloaded - feature request
#56
maguilella
opened
2 years ago
1
unable to install via easy_install
#55
ab050505
closed
2 years ago
1
Recursive URL extraction from PDFs - feature request
#54
LostAccount
opened
2 years ago
0
Title detection heurisitcs
#53
dufferzafar
closed
3 years ago
3
Detect metadata from Arxiv Documents
#52
dufferzafar
opened
3 years ago
0
Adding Timeout CLI parameter
#51
dustywhite7
opened
3 years ago
0
getting a 400 for twitter.com
#50
jimustafa
opened
3 years ago
0
fix: support charset ISO-8859-1, closes #48
#49
Helias
opened
3 years ago
2
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte
#48
Helias
opened
3 years ago
2
JSON Output for Check Links Subcommand
#47
ohsh6o
opened
3 years ago
0
url in hyperlink
#46
vinayaksable2399
opened
3 years ago
1
Changed desc.getchildren() to desc.iter()
#45
vmanke
closed
3 years ago
2
Adding a close() method and context manager methods for class PDFx
#44
PierreSelim
opened
4 years ago
1
timeout option
#43
DanielRuf
opened
4 years ago
2
Thanks a lot, and a question
#42
MohammedAlrozzi
closed
3 years ago
1
Copious INFO logging
#41
tripleee
closed
3 years ago
2
Cuts off links that span two lines
#40
marshalmiller
opened
4 years ago
3
Include Check-Links Results in Output
#39
marshalmiller
closed
3 years ago
1
How to get HyperText(not HyperLink)?
#38
hoelan
opened
4 years ago
0
PDFx won't see links in some PDFs
#37
ghost
opened
5 years ago
1
AttributeError: 'PDFObjRef' object has no attribute 'decode'
#36
DeepLearner7
closed
3 years ago
1
Checking a list of PDF URLs
#35
chenarub
opened
5 years ago
0
Detects pdf URLs that end with parameters (e.g. ?dl=1 on dropbox)
#34
daviddekoning
closed
3 years ago
1
Replaced MarkDown syntax with RST syntax
#33
nicolas-raoul
closed
3 years ago
1
TypeError: '<' not supported between instances of 'tuple' and 'int'
#32
sarora
closed
3 years ago
6
"URI" in PDF attributes may be a string itself
#31
theiostream
opened
6 years ago
1
PDF references should not be treated as such based on extension
#30
theiostream
opened
6 years ago
0
Include tests in PyPI tarball
#29
dotlambda
closed
3 years ago
3
Replace pdfminer dependency with pdfminer.six
#28
marsam
closed
3 years ago
1
AttributeError: 'NoneType' object has no attribute 'findall'
#27
sdwarwick
closed
3 years ago
3
SSL Error?
#26
markratledge
closed
3 years ago
1
Internal links - enhancement request
#25
rupertlevene
opened
7 years ago
0
Fails if output is piped
#24
MagicalTux
closed
3 years ago
1
PDF fails to open if special character in path
#23
oliviercailloux
closed
3 years ago
1
Combine downloaded pdfs into one file / pdf portfolio
#22
Leopoldnak
closed
3 years ago
0
URLs truncated at line endings
#21
bitsgalore
opened
8 years ago
4
Error when running pdf x
#20
jreme100
closed
3 years ago
3
Embedded URLs not being picked up?
#19
gcoladon
opened
8 years ago
0
Way to check only real hyperlinks
#18
capncodewash
opened
8 years ago
2
pdfx reports mailto: links as an error ('nonnumeric port')
#17
capncodewash
opened
8 years ago
3
Unable to install pdfx
#16
fl0x2208
closed
3 years ago
5
Unable to install
#15
dtolj
closed
3 years ago
3
PDFx is storing prior parsed PDFs causing incorrect references / annotations to be found
#14
scottwernervt
closed
8 years ago
1
Check for Unicode chars in PDF files
#13
davemcphee
closed
9 years ago
4
fixes pep8 warnings in extractor.py and libs/xmp.py
#12
taranjeet
closed
9 years ago
0
readme
#11
HarryHamilton
closed
9 years ago
0
pep8 changes across various files
#10
taranjeet
closed
9 years ago
1
Next