Some emails have very long bodies... I think all the info contained in it is not all the words, but rather the length of the email => solution, give real length of email as a parameter and maybe crop a little bit the content inside (inside because usually, informative words are either at the beginning or at the end).
It is also useful to do that in order not to pollute the BoW.
Some emails have very long bodies... I think all the info contained in it is not all the words, but rather the length of the email => solution, give real length of email as a parameter and maybe crop a little bit the content inside (inside because usually, informative words are either at the beginning or at the end).
It is also useful to do that in order not to pollute the BoW.