issues
search
iipc
/
webarchive-commons
Common web archive utility code.
Apache License 2.0
50
stars
71
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump commons-io:commons-io from 2.7 to 2.14.0
#99
dependabot[bot]
closed
1 month ago
0
Make MetaData multi-valued to preserve values of repeating WARC and HTTP headers
#98
sebastian-nagel
opened
1 month ago
0
Bump org.json:json from 20131018 to 20231013
#97
dependabot[bot]
closed
1 month ago
2
Drop dependency on log4j 1
#96
kris-sigur
closed
1 month ago
0
Apache httpclient 3.1 sonatype
#95
DEBARYYA
opened
1 year ago
0
Consider syncing up from the Common Crawl fork
#94
anjackson
closed
2 years ago
1
Compressed WARC InputStream is closed by record iterator.
#93
tlipkis
opened
3 years ago
0
Bump commons-io from 2.4 to 2.7
#92
dependabot[bot]
closed
1 month ago
0
Fix InterruptibleCharSequenceTest (testInterruptibility) to run on JDK 11
#91
sebastian-nagel
closed
3 years ago
0
Bump junit from 3.8.1 to 4.13.1
#90
dependabot[bot]
closed
4 years ago
0
WAT extractor: do not fail on missing WARC-Filename in warcinfo record
#89
sebastian-nagel
closed
4 years ago
1
WAT extractor: do not fail on missing WARC-Filename in warcinfo record
#88
sebastian-nagel
closed
4 years ago
0
Prevent from stackoverflow by limiting length of matched pattern
#87
sebastian-nagel
opened
5 years ago
1
ExtractingParseObserver: extract rel, hreflang and type attributes
#86
sebastian-nagel
closed
5 years ago
0
ExtractingParseObserver: extract links from onClick attributes
#85
sebastian-nagel
closed
5 years ago
1
Replace the org.json dependency by openjson library
#84
sebastian-nagel
opened
5 years ago
5
Update TravisCI config; resolves #82.
#83
ruebot
closed
5 years ago
4
Update TravisCI config
#82
ruebot
closed
5 years ago
0
CompressedWARCReader does not work for Common Crawl WARC files.
#81
YossiTamari
closed
6 years ago
3
Fixing bad dates in WARC file
#80
cjer
closed
6 years ago
6
Update API documentation to reflect current behaviour:
#79
anjackson
opened
6 years ago
1
commons-httpclient-3.1 vulnerability
#78
ldko
opened
7 years ago
1
use commons-collections v3.2.2 to avoid v3.2.1 vulnerability
#77
ndushay
closed
7 years ago
2
upgrade to commons-collections.jar 3.2.2
#76
ndushay
closed
7 years ago
0
Extract also `property` attributes of HTML meta elements
#75
sebastian-nagel
closed
7 years ago
1
Do not add value of preceding HTTP header field if there is no value
#74
sebastian-nagel
closed
7 years ago
3
Move missing unit tests over from Heritrix3
#73
MohammedElsayyed
closed
7 years ago
4
Improve HTML link extraction
#72
sebastian-nagel
closed
7 years ago
0
Logging changes for next release.
#71
ldko
closed
7 years ago
0
Whatwg conformant uri
#70
johnerikhalse
closed
7 years ago
0
URLParser to strip empty port
#69
sebastian-nagel
closed
7 years ago
1
Use CharsetDetector to guess encoding of HTML documents
#68
sebastian-nagel
closed
7 years ago
0
Add attribute "property" of HTML meta elements to WAT HTML-Metadata
#67
sebastian-nagel
closed
7 years ago
1
support WET files
#66
dportabella
closed
8 years ago
3
fix: last header was lost if LF LF (intead of CRLF CRLF)
#65
dportabella
closed
7 years ago
4
HTTPS via a Proxy
#64
PsypherPunk
opened
8 years ago
1
Make regular expression to extract URLs from CSS more restrictive
#63
sebastian-nagel
closed
7 years ago
7
Remove invalid constant
#62
kris-sigur
closed
7 years ago
0
empty header fields populated from previous value
#61
ghost
opened
8 years ago
0
Non-ascii mimetypes
#60
ghost
opened
8 years ago
0
dns records in ARCs
#59
ghost
opened
8 years ago
0
urls with spaces unescaped
#58
ghost
opened
8 years ago
1
StringIndexOutOfBoundsException in patternCSSExtract
#57
sebastian-nagel
closed
7 years ago
2
Require Java 8
#56
johnerikhalse
opened
8 years ago
0
Reorganize into mother and child pom
#55
johnerikhalse
opened
8 years ago
2
Make canonicalizer be able to strip session id params even if they ar…
#54
vonrosen
closed
8 years ago
4
Allow chars in querystring before params to strip
#53
vonrosen
closed
8 years ago
0
Store origin-code in ARCRecord header
#52
jrwiebe
closed
8 years ago
7
flush output etc before tallying stats to fix sizeOnDisk calculation
#51
nlevitt
closed
8 years ago
0
fix for HER-2089 -
#50
nlevitt
closed
8 years ago
0
Next