scrapinghub / shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
BSD 3-Clause "New" or "Revised" License
15 stars 8 forks source link

Use Unidecode or similar when converting to ASCII #37

Closed Gallaecio closed 4 years ago

Gallaecio commented 4 years ago

When converting to ASCII (which I would suggest doing as a function separate from sanitize), I would suggest using Unidecode or similar to avoid removing characters and instead use ASCII alternatives when possible.