iftechfoundation / ifdb-suggestion-tracker

Bugs and feature requests for a future IFDB update
10 stars 0 forks source link

Mojibake and broken queries for some non-ASCII characters in series and genre lists #372

Open jtn20 opened 1 year ago

jtn20 commented 1 year ago

(This is probably related to #371, but I wasn't 100% sure so I raised a new issue.)

The IFDB advanced search page https://ifdb.org/search has folds Show all series names appearing in game listings and Show all genres used in game listings.

On expanding these folds, I see mojibake (wrong characters, indicating UTF-8 interpreted as ISO-8859-1) for some (but not all) items containing non-ASCII characters.

Examples of broken display / links:

but others are fine:

Looking at search and browser developer tools, I see that the data for these comes from queries of the form

Issuing these queries myself, I see:

I don't know the niceties of the Javascript's subsequent interpretation of this XML, but that suggests that emitting more characters as entities in XML, such that the XML is ASCII-only, would at the very least work around this problem.

(Incidentally, there is one genre name that's not the most lovely UTF-8: the work Coke Is It! has a genre that looks like "Children's" but containing Unicode code point U+0092 (manually constructed search link), which looks like it's intended to be a curly apostrophe in the Win1252 style, but is actually a control character. I was hoping this was crufty ancient data, but actually it was only added in 2018. Perhaps the best thing to do is to correct that game entry and hope no-one does it again.)