issues
search
iipc
/
jwarc
Java library for reading and writing WARC files with a typed API
Apache License 2.0
45
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
ParsingException when reading ClueWeb09 files
#86
gijshendriksen
closed
3 weeks ago
3
UncheckedIOException, unexpected end of gzip
#85
gleporeNARA
closed
4 months ago
1
UncheckedIOException, invalid WARC record error
#84
gleporeNARA
closed
4 months ago
3
IllegalArgumentException on ARC Parsing
#83
gleporeNARA
closed
5 months ago
1
ARC variants with different interpretations of version-block length
#82
tballison
closed
5 months ago
7
Custom records awkward to register due to package private constructors in the default records
#81
vlofgren
opened
7 months ago
1
jwarc failing to read Http body at the second pass of a file
#80
vlofgren
closed
7 months ago
3
Multithreading issue on GzipChannel write header
#79
creyer
closed
9 months ago
1
CDX indexer fails to parse (webrecorder) WARC file and terminates.
#78
thomasegense
closed
11 months ago
2
Cdxtool output refactoring
#77
thomasegense
closed
1 year ago
0
CDX-tool has option to use full path to WARC-file in CDX-line.
#76
thomasegense
closed
1 year ago
1
GitHub 71 include revisits
#75
thomasegense
closed
1 year ago
1
Implemented github-73. Keeping WARC payload digest unchanged for CDX
#74
thomasegense
closed
1 year ago
1
CDX indexer: Keep calculated digest from WARC header
#73
thomasegense
closed
1 year ago
1
CDX indexer: CDXJ output support
#72
ato
opened
1 year ago
0
CDX indexer: support revisit records
#71
ato
closed
1 year ago
0
invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200
#70
nice-redbull
closed
12 months ago
1
disable serviceworker in replay proxy mode
#69
sberequek
closed
1 year ago
3
Add WarcRevisit Builder with String targetURI
#68
robertvanloenhout
closed
1 year ago
0
WarcRevisit Builder with String targetURI
#67
robertvanloenhout
closed
1 year ago
1
Build WarcRevisit with refersTo String targetURI
#66
robertvanloenhout
closed
1 year ago
0
Build WarcRevisit with refersTo String targetURI
#65
robertvanloenhout
closed
1 year ago
1
Request/Response Builder with String targetURI
#64
robertvanloenhout
closed
2 years ago
1
Request/Response Builder with String targetURI
#63
robertvanloenhout
closed
2 years ago
2
ARC parser infinite loop reading body
#62
sebastian-nagel
closed
2 years ago
1
Native OSX / Linux binaries do not work
#61
ikreymer
closed
2 years ago
3
Add validate tool
#60
sebastian-nagel
closed
2 years ago
3
WarcDigest: handle SHA-2 digests, support Base64 encoding, auto-detect encoding
#59
sebastian-nagel
closed
2 years ago
0
RecordBuilder: Date/Timestamp truncated if .date(..) is called before .version(WARC_1_1)
#58
lambdaupb
closed
2 years ago
1
replay proxy doesn't start because of sw.js file not found
#57
mbreevoort
closed
3 years ago
1
Recording proxy with browser javax.net.ssl.SSLHandshakeException
#56
sberequek
opened
3 years ago
4
GunzipChannel fails on payload with uncompressed size exceeding int_max
#55
sebastian-nagel
closed
3 years ago
1
GunzipChannel fails on payload with uncompressed size exceeding int_max
#54
sebastian-nagel
closed
3 years ago
0
Gzip compression
#53
alex73
closed
3 years ago
1
Bump junit from 4.12 to 4.13.1
#52
dependabot[bot]
closed
3 years ago
0
lenient http parser: allow empty field names and invalid characters
#51
ato
closed
4 years ago
0
How to parse not standard http header? avoid not throw exception?
#50
ysykzheng
closed
4 years ago
3
Add method payload() to overview of classes and methods
#49
sebastian-nagel
closed
4 years ago
0
Utility methods to read payload body
#48
sebastian-nagel
opened
4 years ago
1
Extract tool: add support for content-encoding `deflate` and `identity`
#47
sebastian-nagel
closed
4 years ago
0
GzipChannel write() to return consumed bytes, fixes #44
#46
sebastian-nagel
closed
4 years ago
1
ByteBuffer inflate and deflate support
#45
ato
opened
4 years ago
0
GzipChannel write() returns compressed length rather than buffer consumption
#44
ato
closed
4 years ago
1
Improve logging of MediaType parse errors
#43
sebastian-nagel
closed
4 years ago
0
Raw header access
#42
ato
closed
2 years ago
1
Tool to extract a WARC record (or its headers or payload)
#41
sebastian-nagel
closed
4 years ago
1
Don't throw on duplicate keys in media type parameters
#40
ato
closed
4 years ago
0
Avoid unchecked exceptions caused by malformed HTTP captures
#39
sebastian-nagel
closed
4 years ago
1
Allow optional space after chunk-size in chunked transfer-encoding, fixes #33
#38
sebastian-nagel
closed
4 years ago
1
Fix extraction of payload without HTTP Content-Length header, fixes #36
#37
sebastian-nagel
closed
4 years ago
0
Next