Closed rgaudin closed 3 years ago
I think this is related to openzim/warc2zim#80 - which, unfortunately, haven't gotten around to addressing that yet. A fuzzy/prefix match is required in the ZIM replay make the replay work.
The replay does work if you crawl to a WACZ, eg:
docker-compose run crawler crawl --url "https://mesquartierschinois.wordpress.com/page/" --limit 1 --generateWACZ
and then load in replayweb.page, so don't think its an issue here.
Following up on https://github.com/openzim/zimit/issues/71.
Zimit 1.1.4 uses
browsertrix-crawler:0.3.1
andwabac@2.7.3
.With those, we seem to capture Youtube embeds as there are matching files in the WARC/ZIM (hard to tell actually) that piles up to 800MB+
Here's a few of those entries in the ZIM, sorted by decreasing sizes:
First thing you'd notice is that there are multiple entries with the very same bytesize, for all of the entries. Safe to assume it's not a coincidence and we are bundling each video data 3+ times each. Note that the log is full of revisit entries.
Now that's not the issue I'm after here. The problem is that those video don't replay. I'm not sure what kind of feedback I should provide here. Please let me know. You can download the ZIM from here (expires in a week).
Source URL: https://mesquartierschinois.wordpress.com/