iftechfoundation / ifdb

The software behind the Interactive Fiction Database (IFDB)
Other
25 stars 18 forks source link

Mojibake and broken queries for some non-ASCII characters in series and genre lists #507

Closed jtn20 closed 1 week ago

jtn20 commented 1 year ago

(This is probably related to iftechfoundation/ifdb#508, but I wasn't 100% sure so I raised a new issue.)

The IFDB advanced search page https://ifdb.org/search has folds Show all series names appearing in game listings and Show all genres used in game listings.

On expanding these folds, I see mojibake (wrong characters, indicating UTF-8 interpreted as ISO-8859-1) for some (but not all) items containing non-ASCII characters.

Examples of broken display / links:

but others are fine:

Looking at search and browser developer tools, I see that the data for these comes from queries of the form

Issuing these queries myself, I see:

I don't know the niceties of the Javascript's subsequent interpretation of this XML, but that suggests that emitting more characters as entities in XML, such that the XML is ASCII-only, would at the very least work around this problem.

(Incidentally, there is one genre name that's not the most lovely UTF-8: the work Coke Is It! has a genre that looks like "Children's" but containing Unicode code point U+0092 (manually constructed search link), which looks like it's intended to be a curly apostrophe in the Win1252 style, but is actually a control character. I was hoping this was crufty ancient data, but actually it was only added in 2018. Perhaps the best thing to do is to correct that game entry and hope no-one does it again.)

dfabulich commented 1 week ago

This seems to have been fixed by fixing #859

dfabulich commented 1 week ago

oops, I'm wrong. The links display properly, but the searches don't work

dfabulich commented 1 week ago

Fixed in #1004