harvard-lil / scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
MIT License
117 stars 8 forks source link

Archiving an archive? #371

Closed edsu closed 4 weeks ago

edsu commented 4 weeks ago

I realize this isn't a common use case but I tried using scoop to archive a page in the Internet Archive Wayback Machine:

$ scoop https://web.archive.org/web/20051221165217if_/https://ldodds.com/writing/

You can find the WACZ here:

https://edsu-webarchives.s3.amazonaws.com/tmp/ldodds.wacz

I noticed that the screenshot looks fine:

Screenshot 2024-10-18 at 11 37 24 AM

But replay gets confused by some kind of recursive loop!

Screenshot 2024-10-18 at 11 34 02 AM

Is there any way around this?

matteocargnelutti commented 4 weeks ago

Hi @edsu 👋 ! I think this is a known playback issue with replayweb.page? Probably worth checking with folks at Webrecorder. Cheers!

edsu commented 4 weeks ago

@ikreymer just let me know that using the id_ instead of if_ view works ok!

https://replayweb.page/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Ftmp%2Fldodds.wacz#view=pages&url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20051221165217id_%2Fhttps%3A%2F%2Fldodds.com%2Fwriting%2F&ts=20241018194358

rebeccacremona commented 4 weeks ago

https://replayweb.page/?source=https%3A%2F%2Fedsu-webarchives.s3.amazonaws.com%2Ftmp%2Fldodds.wacz#view=pages&url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20051221165217id_%2Fhttps%3A%2F%2Fldodds.com%2Fwriting%2F&ts=20241018194358

@edsu For what it's worth, I get "Archived Page Not Found" when following that link.

image

edsu commented 3 weeks ago

Weird, I don't see that ...

Screenshot 2024-10-19 at 7 11 56 PM