Closed jlarmstrongiv closed 3 years ago
It appears capture methods are explicitly off the roadmap. It is certainly possible to create a custom writer that follows the example in the README to work with puppeteer.
There are two main ways to read/write warc files in NodeJS—warcio.js and node-warc.
Between these two options, there will be differences in the metadata. For example, warcio.js automatically includes the content and block hashes.
However, neither of them entirely pass warc validation tests (from warcat and warctools). Both have content-length mismatches. Though, warcio.js
fails more tests.
Subtle differences also mean that the Unarchiver app can only uncompress files from node-warc
. However, other tools such as RelayWeb can open files from both libraries.
However, neither of them entirely pass warc validation tests (from warcat and warctools). Both have content-length mismatches. Though,
warcio.js
fails more tests.Subtle differences also mean that the Unarchiver app can only uncompress files from
node-warc
. However, other tools such as RelayWeb can open files from both libraries.
That's definitely something that should be fixed asap, if there's incorrect lengths, can you provide an example warc and open a new issue?
Yes, direct support for Puppeteer is out of scope in this library as this is supposed to be fairly low-level and also work in the browser, but can see about adding an example.
Yes, direct support for Puppeteer is out of scope in this library as this is supposed to be fairly low-level and also work in the browser, but can see about adding an example.
No worries, I’ll go ahead and close this issue. Thanks!
Would love an example for generating a warc via puppeteer 😄