salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Normalize special characters in string #539

Open winterslu opened 3 years ago

winterslu commented 3 years ago

Problem https://github.com/salesforce/TransmogrifAI/pull/534

Solution this issue could be addressed globally and properly by normalizing special characters into single form. For example: '@' has variation in different form: '@', '﹫', '@' they can be replaced by standard '@'

Alternatives

Additional context https://github.com/salesforce/TransmogrifAI/pull/534