wingman-jr-addon / wingman_jr

This is the official repository (https://github.com/wingman-jr-addon/wingman_jr) for the Wingman Jr. Firefox addon, which filters NSFW images in the browser fully client-side: https://addons.mozilla.org/en-US/firefox/addon/wingman-jr-filter/ Optional DNS-blocking using Cloudflare's 1.1.1.1 for families! Also, check out the blog!
https://wingman-jr.blogspot.com/
Other
35 stars 6 forks source link

Character encoding strikes back #186

Closed wingman-jr-addon closed 1 year ago

wingman-jr-addon commented 1 year ago

User Drago got me some great feedback about the ongoing battle to make the character detection work flawlessly. See #70 for past history.

on some websites (e.g. https://winfuture.de/news,123262.html) special characters like "ä", "ö", "ü", "ß" and probably become broken and shown as �. The developer reduced problematic pages like these to a minimum already, so its not a big deal.

Having an actual site to check against helps so much! I can reproduce the issue.

wingman-jr-addon commented 1 year ago

I looked into this a bit and the issue seems to be related to the fact that we are getting raw bytes that may NOT be UTF-8 encoding and always dump them out as UTF-8 encoded using TextEncoder. This is for sure not the fully correct way to handle this; however, Firefox doesn't support other character sets on the TextEncoder. (See https://github.com/wingman-jr-addon/wingman_jr/blob/03523f1bf06ed4b42693a0a3d56ca888342fbfc3/background.js#L641, https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder) Playing around with this PR as a possible solution, but it may introduce other things as I haven't check the regression tests: https://github.com/wingman-jr-addon/wingman_jr/pull/187

wingman-jr-addon commented 1 year ago

All tests passed after some tweaks - keeping an eye on this for regressions