iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
47 stars 8 forks source link

Update and complete benchmarks #19

Closed sebastian-nagel closed 4 years ago

sebastian-nagel commented 4 years ago

Just a couple of updates of the comparison/benchmarking tool, also to discuss possible further performance improvements

Shortly about the results (on a gzipped WARC file):

Output:

Benchmarking CC-MAIN-20191207160050-20191207184050-00031.warc.gz
iteration 1
gzipinputstream (buffer 8kB)  in 12253ms
gzipinputstream (buffer 64kB)  in 10904ms
webarchive-commons 133945 in 40756ms
webarchive-commons (no digest check) 133945 in 24176ms
jwat buff 133945 in 21584ms
jwarc 133945 in 14623ms

iteration 2
gzipinputstream (buffer 8kB)  in 12583ms
gzipinputstream (buffer 64kB)  in 11482ms
webarchive-commons 133945 in 43104ms
webarchive-commons (no digest check) 133945 in 23460ms
jwat buff 133945 in 20800ms
jwarc 133945 in 14962ms

iteration 3
gzipinputstream (buffer 8kB)  in 12953ms
gzipinputstream (buffer 64kB)  in 12496ms
webarchive-commons 133945 in 44103ms
webarchive-commons (no digest check) 133945 in 24895ms
jwat buff 133945 in 19573ms
jwarc 133945 in 13978ms

Profile using async-profiler (interactive SVG bench.2020-01-17-17-57.async-prof.svg.gz): bench 2020-01-17-17-57 async-prof

ato commented 4 years ago

Interesting. Thanks!