machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"
https://warcreate.com
MIT License
207 stars 13 forks source link

Amount of content saved is dependent on how long the popup is open #95

Open machawk1 opened 7 years ago

machawk1 commented 7 years ago

I encountered an issue when archiving this page, which has many images. If I open the page, click the WARCreate icon, then immediately click the Generate WARC button, many images are missing from the WARC. If I instead click to open the popup, wait a while, then click the button, the WARC contains many more images.

This likely has to due with the code behind the popup gathering/re-fetching all images in the buffer when the popup is opened. If the button is clicked before all images are re-fetched, only those that have been fetched already will be written.

  1. It's already bad that we have to re-fetch content but I believe there is still no way to read the payload as it comes in over the wire.
  2. The button should not be enabled until the fetch procedure is complete. This may include failing of images fetched. Use promises.
machawk1 commented 7 years ago

The await and Promises are not working as expected. Execution proceeds even if the promise has not resolved.

N0taN3rd commented 7 years ago

@machawk1 The issue may be with logic used to know when to call

saveAs(new Blob(arrayBuffers), fileName)

from warcGenerator.js. First instance lines 168 - 176. Second instance lines 321 - 328.

Perhaps putting content.js into the background scripts to allow it just work for all tabs keeping track per tab basis??

machawk1 commented 7 years ago

@N0taN3rd Ideally that would be the way it's done, so we would not be dependent on whether the popup is open (and perhaps just indicate processing in the icon). The content script does allow us to have access to the DOM, so maybe retaining the additional capabilities of the content script might be a good thing. Can you take a stab at this?

N0taN3rd commented 7 years ago

@machawk1 will start stabbing at it. Any other issues I should attempt to address while I'm at it? Also :+1: :-1: to bring in communication with WAIL while I'm at it?

Sent from my Samsung SM-N920V using FastHub

machawk1 commented 7 years ago

@N0taN3rd 👍 , also #7 is high on the list, which your edits with the above might assist in being remedied.

machawk1 commented 5 years ago

Circling back to this, the two calls to saveAs are problematic, particularly when the asynchronouslyFetchImageData() in warcGenerator.js does not seem to be doing its namesake but instead relying on the data passed in. fetchImagePromise() from content.js, which uses XHR, might be reusable but it's currently very decouple with the localStorage logic.