radiolarian / AO3Scraper

A Python scraper for getting fan fiction content and metadata from Archive of Our Own.
175 stars 56 forks source link

Better unicode to ascii conversion using unidecode library #1

Closed kolvia closed 7 years ago

kolvia commented 7 years ago

Uses the Unidecode library to convert unicode characters to their closest ascii representation. Unidecode is good for handling English text with the occasional unicode character (e.g. fiancé to fiance). This fixes the issue with curly quotations, but doesn't address the problem of handling stories written in non-Latin based alphabets such as Chinese.