Open machawk1 opened 7 years ago
I think har2warc is the way to go. No need to duplicate the work already done.
Related: https://github.com/jbenet/http2ipfs
The current behavior of the indexer just spits out the CDXJ headers then finishes, as it finds no WARC records:
!context ["http://oduwsdl.github.io/contexts/cdxj"]
!meta {"created_at": "2017-07-03T23:33:54.579613", "generator": "InterPlanetary Wayback v.0.2017.06.29.1331"}
ipwb indexer currently supports WARC files as input. HTTP Archive (HAR) files may also serve as a trace of HTTP communication, retained as a file. Let's allow ipwb to take HARs as input as well.
The quick solution is to use https://github.com/webrecorder/har2warc then proceed with the current logic.