uc-love-data-week / uc-love-data-week.github.io

https://uc-love-data-week.github.io
GNU General Public License v3.0
4 stars 9 forks source link

Allow non-ASCII text in JSON feed #76

Closed devnich closed 9 months ago

devnich commented 9 months ago

Don't merge this until Kat has tested it.

kekoziar commented 9 months ago

Don't merge this until Kat has tested it.

It reads and renders it, but if someone searches using the single quote (Proquest's), it won't be in the search results because the standard keyboard doesn't have the curly apostrophe (it's auto-converted in most current word processors). Not that we would guess a typical query will include punctuation...

Are there advantages/disadvantages to either? Is one more accessible than another, say for screen readers?

devnich commented 9 months ago

I've modified the feed builder to strip out the "smart" apostrophe after it creates the JSON document.

More generally, we're managing two competing issues:

  1. No one's keyboard has the curly apostrophe, as you point out.
  2. In general, resources on the internet should be encoded as UTF-8.

My preference is to keep everything in UTF-8 unless there are specific usability issues; otherwise we're in a situation where, e.g., people's correctly-spelled spanish names (groovy, inclusive) are converted into ASCII-safe bytes (lame, unkind). We should assume that any modern screen reader can handle correctly encoded UTF-8.

(When I inspect the file served by the webpage, Safari says that it is Content-Type: application/json; charset=utf-8, which is correct).

kekoziar commented 9 months ago

I agree that we should keep everything as UTF-8 for proper diacritical marks, as you noted. Was the original file that was throwing the error on \\" UTF-8? Is that what started this? I can double-check the parser to see if I missed a step in converting \\".

devnich commented 9 months ago

I believe it was unicode (nothing else changed in the code besides replacing \" with '). The \" should be legal JSON (cf. https://datatracker.ietf.org/doc/html/rfc7159.html#section-7), but it's no big deal to replace it.

devnich commented 9 months ago

@stephlabou This is ready to merge.