internetarchive / wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.
GNU Affero General Public License v3.0
656 stars 207 forks source link

Allow users to create Web ARChive (WARC) files #546

Open cgorringe opened 4 years ago

cgorringe commented 4 years ago

There was a suggestion that the extension include a function to be able to save the currently viewed website as a Web ARChive (WARC) file locally on the user's computer. This could be a feature for a future version of the extension.

Relevent Links:

Roushangopal commented 4 years ago

I am willing to work on this problem, can i? if it's still open

cgorringe commented 4 years ago

Yes, it's still open and no one is working on it currently!

We use Slack internally to communicate. You're welcome to join by emailing our director mark@archive.org who can get you an invite. Just let him know that you'd like to work on this (include a link to this issue).

Thank you.

cgorringe commented 3 years ago

To add to this feature idea, quoting this Tweet by @mekarpeles:

Very few browsers have any reasonable mode for auto-caching websites you visit for offline.

Look, if I visit a web page once, I want it literally forever. And I want all its links archived too (because spacial & temporal locality: I'm likely to revisit a thing & things nearby)

I think this would be a neat feature to be able to auto-save websites that you visit to the local filesystem, which can be viewed later offline. Could be implemented using WARC files?

sarthakkundra commented 3 years ago

@cgorringe

I think this would be a neat feature to be able to auto-save websites that you visit to the local filesystem, which can be viewed later offline. Could be implemented using WARC files?

Maybe a session storage list of all the visited websites? Although those will be links or do you want the whole web page to be stored?

cgorringe commented 3 years ago

Yeah I was thinking actual whole web pages stored locally. A list of visited websites can already be accomplished by not clearing the browser's history.

sarthakkundra commented 3 years ago

@cgorringe session storage will only allow 5 MBs won't that be less for the required task?

cgorringe commented 3 years ago

@sarthakkundra Yeah 5 MBs is way too small for this.

This "File System Access API" appears to be new. I couldn't tell if there were file size restrictions. Haven't tried it and don't know what / if any browser supports it yet. https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API https://web.dev/file-system-access/

sarthakkundra commented 3 years ago

@cgorringe this looks interesting. I'll have a look. Meanwhile is there a Slack, Gitter, IRC channel etc that the community uses for communication or everything happens on Github only? I'd love to join if there's one to discuss other ideas as well

public-rant commented 1 year ago

Has there been any progress on this issue? Anything interesting to report?

cgorringe commented 1 year ago

@public-rant No progress currently, sorry.