zeeguu / api

API for tracking a learner's progress when reading materials in a foreign language and recommending further personalized exercises and readings.
https://zeeguu.org
MIT License
8 stars 23 forks source link

filter out articles with strange characters in the title for danish #167

Closed mircealungu closed 2 months ago

mircealungu commented 3 months ago

e.g. for danish:

| 2515497 | Politi fÃ¥r kritik for at undersøge cyklist 'i bund' – men anholdelse var o.k.

| 2514845 | 17-Ã¥rig idømt fængsel – købte pistol af politiagent |

tfnribeiro commented 2 months ago

It seems that the characters map in the following way:

The characters might be slightly different at the start. I will add a step in the rss_feed code to replace these characters? To update the earlier instances we can just run an SQL update. They are all from the same feed (136) which is Politiken like we saw last time.

Again it does seem like this happens only sometimes, so it's one of these strange issues.

mircealungu commented 2 months ago

This is so silly!

We would have to run the SQL update, and also add a little hook - every time we detect these characters in the string either