iftechfoundation / ifdb

The software behind the Interactive Fiction Database (IFDB)
Other
25 stars 18 forks source link

Convert the HTML entities in the database to utf-8 #1002

Closed dfabulich closed 2 weeks ago

dfabulich commented 2 weeks ago

This can be done programmatically by going over all the tables, finding the text-like columns, and scanning rows one by one.

In PHP the function is html_entity_decode, and in Python it's html.unescape. I don't think node has it built-in.

I can write it if you want.

Once that's done, only rich text elements should inject such HTML entities into the page. htmlspecialcharx should otherwise NOT leave &#nnnn; sequences intact.

Originally posted by @salty-horse in https://github.com/iftechfoundation/ifdb/issues/998#issuecomment-2456259315

dfabulich commented 2 weeks ago

I worked on this today!