issues
search
iipc
/
jwarc
Java library for reading and writing WARC files with a typed API
Apache License 2.0
48
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
IPv6 canonicalization failures
#92
sebastian-nagel
closed
3 days ago
1
URIs.parseLeniently modifies URIs containing percent-encoded characters
#91
sebastian-nagel
closed
5 days ago
4
URIs.parseLeniently fails to parse WARC-Target-URI with query and empty path with no slash before query
#90
sebastian-nagel
closed
5 days ago
1
CdxRequestEncoder: warcio.js compatible encoding mode
#89
ato
opened
1 week ago
0
Utility method to decode payload using the HTTP Content-Encoding header
#88
sebastian-nagel
closed
1 week ago
1
ValidateTool: fix digest validation
#87
sebastian-nagel
closed
2 weeks ago
0
ParsingException when reading ClueWeb09 files
#86
gijshendriksen
closed
4 months ago
3
UncheckedIOException, unexpected end of gzip
#85
gleporeNARA
closed
9 months ago
1
UncheckedIOException, invalid WARC record error
#84
gleporeNARA
closed
8 months ago
3
IllegalArgumentException on ARC Parsing
#83
gleporeNARA
closed
9 months ago
1
ARC variants with different interpretations of version-block length
#82
tballison
closed
9 months ago
7
Custom records awkward to register due to package private constructors in the default records
#81
vlofgren
opened
11 months ago
1
jwarc failing to read Http body at the second pass of a file
#80
vlofgren
closed
11 months ago
3
Multithreading issue on GzipChannel write header
#79
creyer
closed
1 year ago
1
CDX indexer fails to parse (webrecorder) WARC file and terminates.
#78
thomasegense
closed
1 year ago
2
Cdxtool output refactoring
#77
thomasegense
closed
1 year ago
0
CDX-tool has option to use full path to WARC-file in CDX-line.
#76
thomasegense
closed
1 year ago
1
GitHub 71 include revisits
#75
thomasegense
closed
1 year ago
1
Implemented github-73. Keeping WARC payload digest unchanged for CDX
#74
thomasegense
closed
1 year ago
1
CDX indexer: Keep calculated digest from WARC header
#73
thomasegense
closed
1 year ago
1
CDX indexer: CDXJ output support
#72
ato
opened
1 year ago
0
CDX indexer: support revisit records
#71
ato
closed
1 year ago
0
invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200
#70
nice-redbull
closed
1 year ago
1
disable serviceworker in replay proxy mode
#69
sberequek
closed
1 year ago
3
Add WarcRevisit Builder with String targetURI
#68
robertvanloenhout
closed
1 year ago
0
WarcRevisit Builder with String targetURI
#67
robertvanloenhout
closed
1 year ago
1
Build WarcRevisit with refersTo String targetURI
#66
robertvanloenhout
closed
2 years ago
0
Build WarcRevisit with refersTo String targetURI
#65
robertvanloenhout
closed
2 years ago
1
Request/Response Builder with String targetURI
#64
robertvanloenhout
closed
2 years ago
1
Request/Response Builder with String targetURI
#63
robertvanloenhout
closed
2 years ago
2
ARC parser infinite loop reading body
#62
sebastian-nagel
closed
3 years ago
1
Native OSX / Linux binaries do not work
#61
ikreymer
closed
3 years ago
3
Add validate tool
#60
sebastian-nagel
closed
3 years ago
3
WarcDigest: handle SHA-2 digests, support Base64 encoding, auto-detect encoding
#59
sebastian-nagel
closed
3 years ago
0
RecordBuilder: Date/Timestamp truncated if .date(..) is called before .version(WARC_1_1)
#58
lambdaupb
closed
3 years ago
1
replay proxy doesn't start because of sw.js file not found
#57
mbreevoort
closed
3 years ago
1
Recording proxy with browser javax.net.ssl.SSLHandshakeException
#56
sberequek
opened
3 years ago
4
GunzipChannel fails on payload with uncompressed size exceeding int_max
#55
sebastian-nagel
closed
3 years ago
1
GunzipChannel fails on payload with uncompressed size exceeding int_max
#54
sebastian-nagel
closed
3 years ago
0
Gzip compression
#53
alex73
closed
3 years ago
1
Bump junit from 4.12 to 4.13.1
#52
dependabot[bot]
closed
3 years ago
0
lenient http parser: allow empty field names and invalid characters
#51
ato
closed
4 years ago
0
How to parse not standard http header? avoid not throw exception?
#50
ysykzheng
closed
4 years ago
3
Add method payload() to overview of classes and methods
#49
sebastian-nagel
closed
4 years ago
0
Utility methods to read payload body
#48
sebastian-nagel
opened
4 years ago
1
Extract tool: add support for content-encoding `deflate` and `identity`
#47
sebastian-nagel
closed
4 years ago
0
GzipChannel write() to return consumed bytes, fixes #44
#46
sebastian-nagel
closed
4 years ago
1
ByteBuffer inflate and deflate support
#45
ato
opened
4 years ago
0
GzipChannel write() returns compressed length rather than buffer consumption
#44
ato
closed
4 years ago
1
Improve logging of MediaType parse errors
#43
sebastian-nagel
closed
4 years ago
0
Next