kevindeyne / vardogr

Vardøgr is a CLI that can push production-like data to test environments securely and at scale
https://kevindeyne.github.io/vardogr
MIT License
1 stars 0 forks source link

Integrate language detection #7

Open kevindeyne opened 4 years ago

kevindeyne commented 4 years ago

Maybe use Tika Apache: https://tika.apache.org/1.16/api/org/apache/tika/language/detect/LanguageDetector.html

Example: If we notice a column with some Japanese characters, actually use some Japanese characters.

kevindeyne commented 3 years ago

I actually want to make sure we don't touch any data, so I don't want to search for specific values.

But I do think adding different languages is a good idea.

I think instead I should look at what's allowed to be in a column (based on collation, encoding) and then input values from any language based on that. Doing it this way would also prevent people from pushing their biases on data.