louismartin / email-classification-challenge

Altegard challenge in collaboration w/ Linagora
https://inclass.kaggle.com/c/master-data-science-mva-data-competition-2017
2 stars 1 forks source link

Too long body #5

Open zaccharieramzi opened 7 years ago

zaccharieramzi commented 7 years ago

Some emails have very long bodies... I think all the info contained in it is not all the words, but rather the length of the email => solution, give real length of email as a parameter and maybe crop a little bit the content inside (inside because usually, informative words are either at the beginning or at the end).

It is also useful to do that in order not to pollute the BoW.