Open leon opened 3 years ago
This is worth looking into, but bear in mind that timesnap
uses page.evaluate
to transfer canvas data. According to the puppeteer docs, page.evaluate
requires a function that returns a serializable object, which from my understanding, Blobs or ArrayBuffers are not.
Also note that toBlob
is asynchronous, while toDataURL
is synchronous, so even if a suitable method of data transfer is found, this would require a bit of a rewrite. The asynchronous nature also makes me think that gains are not guaranteed, so this would require some performance testing.
After doing a quick search, see puppeteer/puppeteer#3722 -- there doesn't seem to be a direct way to return a Blob or ArrayBuffer from puppeteer, and the suggested solution converts the blob to a binary string.
There might be some performance gains of going the blob -> binary string route instead of transferring a base64 string, though I'm skeptical since it would use two asynchronous methods (toBlob
and FileReader.onload
). I imagine the gains going this route would be primarily through the transferring the data, as the DevTools protocol can be a bottleneck, and to a lesser extent, skipping the base64 encoding/decoding. If you're interested, you can implement this and test it out and report the performance differences.
Checked into it also and as you say, since the Devtools Protocol is based on JSONRPC we are stuck with having to serialize everything to strings, which is really bad for performance.
I have a few suggestions that could be explorered to work around using the Devtools protocol
We start a http server which kan stream the Uint8 png files via window.fetch https://web.dev/fetch-upload-streaming/#writable-streams Since everything is on localhost it should be very fast. But it means we need to launch a web server in parallel with chrome and make sure everyting is working correctly there.
const dirHandle = await window.showDirectoryPicker();
const screenshotDirHandle = await dirHandle.getDirectoryHandle('screenshots', { create: true });
const screenshotFileHandle = await screenshotDirHandle.getFileHandle('screenshot.png', { create: true });
const writable = await screenshotFileHandle.createWritable();
// Write the contents of the file to the stream.
await writable.write(blobcontent);
// Close the file and write the contents to disk.
await writable.close();
The problem is that we need to use HTTPS otherwise the window.showDirectoryPicker
isn't available
And it also requires the user to pick a directory and allow us to read and write to it.
I've seen puppeteer code that enables other kinds of interaction with dialogboxes and filepickers so this might work.
It might be a bit more work to be allowed to write the files, but then it should be as fast as saving any file.
My knowledge of the devtools protocol is minimal, so it might already be possible. but we could also try asking them how to best transfer the data.
What do you think, are any of these worth pursuing?
I created a lab for the File Handling API.
https://github.com/leon/labs-chrome-file-write
It is still rather slow. it took ~800ms when trying to save 10 10mb files in parallel.
In contrast I wrote a bash version which could do it in 63ms.
So there is certainly some overhead in chromes implementation.
Using a different transfer protocol is an interesting idea, though probably using something like a web server is beyond the scope of this project.
If canvas capturing and file storing happens outside of puppeteer, you probably don't even need puppeteer. There's timeweb
that runs directly in the browser and would probably yield a cleaner implementation. It's meant for custom projects. I haven't gotten around to writing a tutorial for it yet, though.
As far as using new features in Chromium (new or future), I'm currently keeping the timesnap
dependency of puppeteer
to version 2.1, which installs Chromium 80.0.3987.0, because subsequent versions seem to require additional libraries to be installed.
Capture speed isn't really the goal of this project-- there are bottlenecks in puppeteer and JavaScript that make it difficult to optimize, but I'm open to optimizations that don't require many changes.
When we try to save the images to png, it has to base64 encode the images before we then convert them back to a Buffer. But with toBlob we get the raw png / jpeg data directly and can skip the base64 step.
https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/toBlob