oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
602 stars 40 forks source link

Support HAR files #210

Open machawk1 opened 7 years ago

machawk1 commented 7 years ago

ipwb indexer currently supports WARC files as input. HTTP Archive (HAR) files may also serve as a trace of HTTP communication, retained as a file. Let's allow ipwb to take HARs as input as well.

The quick solution is to use https://github.com/webrecorder/har2warc then proceed with the current logic.

ibnesayeed commented 7 years ago

I think har2warc is the way to go. No need to duplicate the work already done.

machawk1 commented 7 years ago

Related: https://github.com/jbenet/http2ipfs

machawk1 commented 7 years ago

The current behavior of the indexer just spits out the CDXJ headers then finishes, as it finds no WARC records:

!context ["http://oduwsdl.github.io/contexts/cdxj"]
!meta {"created_at": "2017-07-03T23:33:54.579613", "generator": "InterPlanetary Wayback v.0.2017.06.29.1331"}