phHartl / eu-judgement-analyse

Quantitative analysis of judgments of the European Court of Justice
MIT License
6 stars 0 forks source link

Translate \u unicode characters in response to readable format #25

Closed thomfischer closed 4 years ago

thomfischer commented 4 years ago

not fixed. example response: 'party names': 'In Case C‑341/15, REQUEST for a preliminary ruling under Article\xa0267 TFEU from the Verwaltungsgericht Wien (Administrative Court of Vienna, Austria), made by decision of 22\xa0June 2015,

\xa0267 aka non-breaking space has to be replaced with regular spaces

thomfischer commented 4 years ago

escaped unicode seems to only be present in saved .json files. we will have to see if they will still be present, once the text has been inserted into the database

phHartl commented 4 years ago

unicodedata.normalize("NFKD", text) seems to do the trick

thomfischer commented 4 years ago

once saved in the database, there are no more escaped unicode characters.