webrecorder / replayweb.page

Serverless replay of web archives directly in the browser
https://replayweb.page
GNU Affero General Public License v3.0
707 stars 58 forks source link

[Feature]: Add Size Limit for local file picker when loading WARC or local file-system based WACZ loading. #328

Open ikreymer opened 5 months ago

ikreymer commented 5 months ago

Context

When loading any WARC file, or loading a WACZ file on a system that doesn't support window.showOpenFilePicker / File System Access API (non-Chrome based browsers), there should be a limit to the file that will be loaded. This is because loading a WARC file will always require reading the entire file into memory and indexing. Loading a WACZ file using the default FileReader API also has the affect of requiring the entire WACZ to be read into memory, since there is no way to seek into the file. For this reason, we should restrict which files can be loaded in this way, as it's easy to use up all the RAM when loading a multi-GB file this way.

The window.showOpenFilePicker / File System Access API available in Chrome allows for seeking to files on disk without loading the whole thing, so in this case, any WACZ file can be loaded. Of course, WACZ files loaded over HTTP can also be of any size, as only the necessary data is loaded.

What change would you like to see?

As a user, I want to be notified when loading a larger WACZ or WARC file that I have locally will use up too much RAM, and prevented from doing so. Ideally, a solution is also provided (eg. convert to WACZ, host via an HTTP server instead of loading the file)

Requirements

When loading the file that is too large, the standard error message should be shown informing user that the file can not be loaded.

Todo

No response