wingman-jr-addon / wingman_jr

This is the official repository (https://github.com/wingman-jr-addon/wingman_jr) for the Wingman Jr. Firefox addon, which filters NSFW images in the browser fully client-side: https://addons.mozilla.org/en-US/firefox/addon/wingman-jr-filter/ Optional DNS-blocking using Cloudflare's 1.1.1.1 for families! Also, check out the blog!
https://wingman-jr.blogspot.com/
Other
35 stars 6 forks source link

3.4.0 Regresses Certain Characters #206

Closed wingman-jr-addon closed 4 weeks ago

wingman-jr-addon commented 1 month ago

LinkedIn on 3.3.6: image LinkedIn on 3.4.0: image

Note that the dot no longer translates. This seems similar to #199 but that specific case didn't seem to have regressed.

wingman-jr-addon commented 1 month ago

Ok, so I think I figured out what triggered this? LinkedIn declares as an HTML 5 document, but does not set character encoding via charset, meta, etc. In this case I believe it is generally the locale plus heuristics that define the use of the encoding, which I believe would usually fall back to iso-8859-1/Windows-1252. However, that fails encoding and causes mojibake. Catching that specific scenario and only temporarily falling back to utf-8 on a per chunk basis looks like it resolves the issue.

Test code is on branch https://github.com/wingman-jr-addon/wingman_jr/tree/fallback-to-utf8

wingman-jr-addon commented 4 weeks ago

Fixed somewhat by #207 at least enough for now.