Open hanoii opened 5 years ago
Maybe you’d like https://warcreate.com ?
There is also the https://webrecorder.io/ project if you haven't looked at that.
It is totally feasible to do web archiving by browsing through warcprox, as discussed on #110. Warcreate or webrecorder are also good options. The best choice depends on the details of your use case
I tried both webrecorder and warcreate and none worked, at least with facebook, out of the box. Following up on some issues on both projects.
@hanoii What are you using for playback? Even if the WARCs you make are fine, FB is a pain to play back properly.
I tried https://github.com/webrecorder/webrecorder-player. Once I actually got warcprox to work it did work fairly well. Still need to try a few things out.
Do you know by any chance if warc store video streaming as well?
Yes streamed videos usually will be stored in the warc. Segmented videos are common these days, so playback is another question. It's possible pywb handles that already, don't know.
I am researching various archiving tools for a use case of local researching and archiving. It should be easy to install and use by students so trying to find the best tool out there.
First question is whether you think this might end up working for such an use case.
The more I get into this the more I feel like the browser itself is the one who should be in full control of the archiving process. As far as the browser can browse a site, it should be able to archive it (and even reply it) properly. I wonder if you have ever considered this, know of someone who would and the general thought of this approach.
I am namely speaking of grabbing chromium or mozilla open source project and patching/working on top of it.
This is just an attempt to gather some thoughts and opinions from people much involved on archiving than currently myself is, so anything is appreciated.
Thanks.