Closed devnich closed 9 months ago
Don't merge this until Kat has tested it.
It reads and renders it, but if someone searches using the single quote (Proquest's), it won't be in the search results because the standard keyboard doesn't have the curly apostrophe (it's auto-converted in most current word processors). Not that we would guess a typical query will include punctuation...
Are there advantages/disadvantages to either? Is one more accessible than another, say for screen readers?
I've modified the feed builder to strip out the "smart" apostrophe after it creates the JSON document.
More generally, we're managing two competing issues:
My preference is to keep everything in UTF-8 unless there are specific usability issues; otherwise we're in a situation where, e.g., people's correctly-spelled spanish names (groovy, inclusive) are converted into ASCII-safe bytes (lame, unkind). We should assume that any modern screen reader can handle correctly encoded UTF-8.
(When I inspect the file served by the webpage, Safari says that it is Content-Type: application/json; charset=utf-8, which is correct).
I agree that we should keep everything as UTF-8 for proper diacritical marks, as you noted. Was the original file that was throwing the error on \\"
UTF-8? Is that what started this? I can double-check the parser to see if I missed a step in converting \\"
.
I believe it was unicode (nothing else changed in the code besides replacing \"
with '
). The \"
should be legal JSON (cf. https://datatracker.ietf.org/doc/html/rfc7159.html#section-7), but it's no big deal to replace it.
@stephlabou This is ready to merge.
Don't merge this until Kat has tested it.