Closed apataga closed 2 years ago
Thanks, will see if there's anything that can be done, its a bit tricky since the entire file must be read in the browser.
For much better results, the recommendation is to convert the WARC filses there into a WACZ file, which can be done using the Python py-wacz
tool (https://github.com/webrecorder/wacz-format/tree/main/py-wacz). The web archives could then be read quite quickly w/o needing to load the entire file.
Perhaps that's even something that ArchiveBot Viewer could do automatically...
I was able to load one of the larger WARCs on an OSX 1.5.2 player, so it seems like it may be a bit less predictable.
Understood. Maybe you can explain how to restore gamerankings.com as an offline site? To be able to search by gaming platforms, sort by date, etc. All links on the Internet about opening .warc files lead to your programs. :)
I am using Chrome in Windows and it is happening for a 340 MB WARC file. Mine is from archive.org.
Hi! I'm also having trouble with the app. Im always getting stuck at 77%. Im using the app from here and also chrome. I am trying to open a 5gb WARC file
Yeah, you either have to use WARCZ
, which adds the index to the file (don't know why it wasn't in the original spec), or you must use pywb
to index and play it back on your own website. Is there a reason why you must use a separate website, rather than hosting your own?
We would like to support this use case with replayweb.page as well, separate from pywb, so definitely hope to fix this! I haven't been able to reproduce this issue consistently yet and hoping we can improve this a bit.
Would also be great to be able to offer WARC->WACZ conversion in the browser, which would require loading the WARC fully at least once.
also, running pywb requires someone to run a web server, while replayweb.page can host a web archive from any static storage, so these are slightly different use cases.
For anyone having issues with WARCs getting stuck loading, can you try loading on this dev version at: https://dev.replayweb.page/
This version uses a web worker to do the loading, and also lists the actual number of records loaded along the percentage. I'd be curious to know if:
The percentage reflects the total size of the WARC loaded, however, it is not uniform for how much time it will take. Eg. a WARC with 100 small records in 1MB will probably take longer than a single record of 100MB. I am curious if the WARC is actually getting stuck, or if its just loading very slowly, which would help how best to address the issues.
This is now up on the main replayweb.page, I have not been able to detect WARCs getting completely stuck, so closing this for now. Occasionally, it can be slow, however, especially on Firefox. Chrome/Chromium-based browser seem to load much more quickly, which may be a separate issue to investigate..
Hello! Replayweb.page app (v1.4.0—1.5.2 on Widndows 10 20H2) stops at 30-40% loading of any .warc file > 1GB from this save of gamerankings.com — https://archive.fart.website/archivebot/viewer/job/9uxhl At the same time, 650MB file is loaded completely. And your past program (Webrecorder player) loads any .warc file but is much slower.