machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"
https://warcreate.com
MIT License
206 stars 13 forks source link

JavaScript-inserted text appears twice upon replay #89

Closed weiglemc closed 7 years ago

weiglemc commented 7 years ago

I maintain http://www.freemasonstreet.org/ and use it sometimes for testing our tools. One thing I've done is use JavaScript to write out parts of the webpage that might change occasionally.

When the page loads, the JavaScript is run and the proper text is added to the page. WARCreate captures the DOM after this text has been added. But then when the memento is replayed in the browser, the JavaScript is run again and the text is added to the page a 2nd time. See screenshot.

I don't know what the solution would be, but we need to be aware of this behavior.

screen shot 2017-02-14 at 1 18 10 pm

machawk1 commented 7 years ago

As discussed in-person, this is a bug that results in the WARCreate dynamic of capturing the page after it has been potentially manipulated by JavaScript. Ideally, we would get the HTML page and all related resources prior to manipulation then reproduce the manipulation on replay. However, I don't believe we can capture the payload as it comes over-the-wire but we can identify the URI-Rs of the resources using the webRequest API.

@N0taN3rd Can you verify whether we are able to get the payload as they are being consumed by Chrome? When WARCreate was originally written, I believe this was not possible (though webRequest didn't exist -- WARCreate since used webRequest).

N0taN3rd commented 7 years ago

@machawk1 @weiglemc

It is still not possible using the webRequest API but it has been approved by the chrome folks for addition to future chrome version if they can get someone to work on it

You may have some luck hooking into the debugger and extracting the response body.

I tried it for use in WAIL but I think something on the electron side made it wonky.

machawk1 commented 7 years ago

@N0taN3rd If I remember correctly, usage of the debugger API in Chrome extension is only available if the debugger is activated/visible. I have not interfaced with this API before, so don't know if this is (still?) the case.

N0taN3rd commented 7 years ago

@machawk1 They expose debugger accessed by requesting a permission with the same name in the manifest. Do not believe you need an devtool page for access to it.

From chrome docs on it: "Use chrome.debugger to attach to one or more tabs to instrument network interaction, debug JavaScript, mutate the DOM and CSS, etc. Use the Debuggee tabId to target tabs with sendCommand and route events by tabId from onEvent callbacks. "

WAIL use example and debugger protocol viewer

machawk1 commented 7 years ago

@weiglemc This has been fixed with 107f4632af47f3be588c8536305819a23643d873.