webrecorder / wabac.js

wabac.js - Web Archive Browsing Augmentation Client
https://replayweb.page
GNU Affero General Public License v3.0
98 stars 16 forks source link

When a JavaScript is UTF-16 BE/LE encoded with or without a BOM (Byte Order Marker), the respective JavaScript is not processed. #155

Closed ARiedijk closed 7 months ago

ARiedijk commented 8 months ago

When a JavaScript is UTF-16 BE/LE encoded with or without a BOM (Byte Order Marker), the respective JavaScript is not processed.

The function parseLetConstGlobals(text) is not called because the async getText(isUTF8=false) function in response.js does not examine the read buffer to determine which encoding is applicable. The internal browser JavaScript parser does have this functionality. I have added an example Warc file. utf8-utf16-js-test.warc.gz

ikreymer commented 7 months ago

Thanks for the repro sample! The PR in #160 should fix this, checking the BOM and applying the proper decoding for any text content.

ikreymer commented 7 months ago

Fixed by #160