machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"
https://warcreate.com
MIT License
205 stars 13 forks source link

Extension not working on most websites #128

Closed gjvnq closed 3 years ago

gjvnq commented 3 years ago

I tried using this extension on https://warcreate.com/ and it worked. However, when I try to use it on https://news.ycombinator.com it fails silently.

The console only shows:

content.js:100 Asynchronous image fetch
content.js:101 ALL IMAGES ARE NOT REPRESENTED HERE! CSS ONES ARE MISSING
(anonymous) @ content.js:101
content.js:11 https://news.ycombinator.com/y18.gif
content.js:13 Fetching entity
content.js:21 Normalizing
content.js:25 Image data normalized
content.js:28 Associating in JS
content.js:42 Resolving image fetch promise
content.js:11 https://news.ycombinator.com/s.gif
content.js:13 Fetching entity
content.js:21 Normalizing
content.js:25 Image data normalized
content.js:28 Associating in JS
content.js:42 Resolving image fetch promise
content.js:11 https://news.ycombinator.com/grayarrow.gif
content.js:13 Fetching entity
content.js:21 Normalizing
content.js:25 Image data normalized
content.js:28 Associating in JS
content.js:42 Resolving image fetch promise

That test was ran on incognito mode with no other extensions and the console was set to include all messages (including verbose).

On some websites (like GitHub) I sometimes see the following message on the console:

content.js:216 Uncaught (in promise) Error: Attempting to use a disconnected port object
    at content.js:216
(anonymous) @ content.js:216
async function (async)
(anonymous) @ content.js:172
machawk1 commented 3 years ago

@gjvnq Thanks for the report. I will have a look.

machawk1 commented 3 years ago

I was able to replicate this issue with the current main branch, which rules out a delta between the version on the Chrome Store and the source.

machawk1 commented 3 years ago

From source, I get this issue when trying to create a WARC from the URL supplied.

hn

machawk1 commented 3 years ago

This seems to be caused by the markup:

<script type='text/javascript' src='hn.js?vbGHMem4meAJX9ahZqcC'></script></html>

at the very end of the HN HTML source.

The constructor for a URL object in WARCreate's warcGenerator.js:

parts[0] = (new window.URL(parts[0])).href

fails to construct the URL from "hn.js?vbGHMem4meAJX9ahZqcC". This should be caught, reported, and allow the programmatic flow to continue.

machawk1 commented 3 years ago

Catching the TypeError and continuing the loop allows the WARC to be generated. Some logic needs to be added to resolve URIs like hn.js without any relative indicator. I believe using the native window.URL was for the basis of URIs like this, namely those that are relative (e.g., "./hn.js").

machawk1 commented 3 years ago

This has been resolved in c08bd3a and should be merged into the main branch shortly and hopefully push to the Chrome Dev Store soon after.

machawk1 commented 3 years ago

@gjvnq A new version of WARCreate (v0.2021.6.28) has been submitted for review to the Chrome Web Store. It should be available once it gets the approval of the admins there.

gjvnq commented 3 years ago

Thanks!

Gabriel Queiroz

On Mon, 28 Jun 2021 at 14:23, Mat Kelly @.***> wrote:

@gjvnq https://github.com/gjvnq A new version of WARCreate (v0.2021.6.28) has been submitted for review to the Chrome Web Store. It should be available once it gets the approval of the admins there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/machawk1/warcreate/issues/128#issuecomment-869870126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXIYHDDE7AGSNKSKV7PS5LTVCV2BANCNFSM47AQOWRQ .