Closed ktorn closed 2 weeks ago
You are missing the trailing slash, so its not a full URL, if you do:
captureWebPage('https://arkivo.art/', 'arkivo.warc');
It will work. (Note that copying gist will not work because it has \r
stripped out).
Also, a quick follow-up question, is there a way to automatically fetch all the assets used by the page? (css, js, etc) My end goal is to fully capture a code-based artwork (like this one one), including network requests initiated by the piece.
This library is designed to be low-level tool for writing WARC files. To do what you want, you need to capture through the browser. We have several tools that do this:
For interactive artworks, the extension would be your best option, since you can interact with the browser and have all of the network traffic be captured. For more help / discussion, check out our forum at https://forum.webrecorder.net/
It worked, thanks!
I need to do this programatically, from a node app, but I will investigate Browsertrix, thanks!
Also added a fix in #77 that will add a trailing slash if it's missing. Closing this as the original issue is answered.
Hi,
I'm trying to test the basic scenario of capturing a URL, and save it as a WARC file.
The code below kind of works, the file is created (see gist), but when I try to open the WARC file in ReplayWeb.page it says "Archived Page Not Found"
Any ideas how to troubleshoot this?
Also, a quick follow-up question, is there a way to automatically fetch all the assets used by the page? (css, js, etc) My end goal is to fully capture a code-based artwork (like this one one), including network requests initiated by the piece.
My code: