issues
search
iipc
/
webarchive-commons
Common web archive utility code.
Apache License 2.0
49
stars
72
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Drop dependency on log4j 1
#96
kris-sigur
opened
8 months ago
0
Apache httpclient 3.1 sonatype
#95
DEBARYYA
opened
1 year ago
0
Consider syncing up from the Common Crawl fork
#94
anjackson
closed
2 years ago
1
Compressed WARC InputStream is closed by record iterator.
#93
tlipkis
opened
3 years ago
0
Bump commons-io from 2.4 to 2.7
#92
dependabot[bot]
opened
3 years ago
0
Fix InterruptibleCharSequenceTest (testInterruptibility) to run on JDK 11
#91
sebastian-nagel
closed
3 years ago
0
Bump junit from 3.8.1 to 4.13.1
#90
dependabot[bot]
closed
3 years ago
0
WAT extractor: do not fail on missing WARC-Filename in warcinfo record
#89
sebastian-nagel
closed
4 years ago
1
WAT extractor: do not fail on missing WARC-Filename in warcinfo record
#88
sebastian-nagel
closed
4 years ago
0
Prevent from stackoverflow by limiting length of matched pattern
#87
sebastian-nagel
opened
4 years ago
1
ExtractingParseObserver: extract rel, hreflang and type attributes
#86
sebastian-nagel
closed
4 years ago
0
ExtractingParseObserver: extract links from onClick attributes
#85
sebastian-nagel
closed
4 years ago
1
Replace the org.json dependency by openjson library
#84
sebastian-nagel
opened
4 years ago
5
Update TravisCI config; resolves #82.
#83
ruebot
closed
4 years ago
4
Update TravisCI config
#82
ruebot
closed
4 years ago
0
CompressedWARCReader does not work for Common Crawl WARC files.
#81
YossiTamari
closed
5 years ago
3
Fixing bad dates in WARC file
#80
cjer
closed
6 years ago
6
Update API documentation to reflect current behaviour:
#79
anjackson
opened
6 years ago
1
commons-httpclient-3.1 vulnerability
#78
ldko
opened
6 years ago
1
use commons-collections v3.2.2 to avoid v3.2.1 vulnerability
#77
ndushay
closed
6 years ago
2
upgrade to commons-collections.jar 3.2.2
#76
ndushay
closed
6 years ago
0
Extract also `property` attributes of HTML meta elements
#75
sebastian-nagel
closed
7 years ago
1
Do not add value of preceding HTTP header field if there is no value
#74
sebastian-nagel
closed
7 years ago
3
Move missing unit tests over from Heritrix3
#73
MohammedElsayyed
closed
7 years ago
4
Improve HTML link extraction
#72
sebastian-nagel
closed
7 years ago
0
Logging changes for next release.
#71
ldko
closed
7 years ago
0
Whatwg conformant uri
#70
johnerikhalse
closed
7 years ago
0
URLParser to strip empty port
#69
sebastian-nagel
closed
7 years ago
1
Use CharsetDetector to guess encoding of HTML documents
#68
sebastian-nagel
closed
7 years ago
0
Add attribute "property" of HTML meta elements to WAT HTML-Metadata
#67
sebastian-nagel
closed
7 years ago
1
support WET files
#66
dportabella
closed
7 years ago
3
fix: last header was lost if LF LF (intead of CRLF CRLF)
#65
dportabella
closed
7 years ago
4
HTTPS via a Proxy
#64
PsypherPunk
opened
7 years ago
1
Make regular expression to extract URLs from CSS more restrictive
#63
sebastian-nagel
closed
7 years ago
7
Remove invalid constant
#62
kris-sigur
closed
7 years ago
0
empty header fields populated from previous value
#61
ghost
opened
7 years ago
0
Non-ascii mimetypes
#60
ghost
opened
7 years ago
0
dns records in ARCs
#59
ghost
opened
7 years ago
0
urls with spaces unescaped
#58
ghost
opened
7 years ago
1
StringIndexOutOfBoundsException in patternCSSExtract
#57
sebastian-nagel
closed
7 years ago
2
Require Java 8
#56
johnerikhalse
opened
8 years ago
0
Reorganize into mother and child pom
#55
johnerikhalse
opened
8 years ago
2
Make canonicalizer be able to strip session id params even if they ar…
#54
vonrosen
closed
8 years ago
4
Allow chars in querystring before params to strip
#53
vonrosen
closed
8 years ago
0
Store origin-code in ARCRecord header
#52
jrwiebe
closed
8 years ago
7
flush output etc before tallying stats to fix sizeOnDisk calculation
#51
nlevitt
closed
8 years ago
0
fix for HER-2089 -
#50
nlevitt
closed
8 years ago
0
Empty String argument in CharsetDetector.trimAttrValue causes StringIndexOutOfBoundsException
#49
gerhardgossen
closed
9 years ago
1
WAT extractor: Entity-Trailing-Slop-Bytes should be called Entity-Trailing-Slop-Length
#48
ldko
closed
8 years ago
0
WAT extractor: adding information in WAT's warcinfo
#47
scheylord
closed
8 years ago
0
Next