snakers4 / emoji_sentiment

4 stars 0 forks source link

Dataset EDA #1

Closed snakers4 closed 5 years ago

snakers4 commented 5 years ago

Used this pipeline to process ~1 year worth of tweets from here.

The emoji set from TorchMoji / DeepMoji was chosen as the basis.

Next steps

snakers4 commented 5 years ago

FastText language detection prob

изображение

By setting a threshold around 0.8 we filter our about 1/3 of data with ambiguous detection 24,177,435 => 16,221,714