j6mes / nlpj2017-fnc-ensemble

Ensemble Classifier for fake news challenge 2017
Apache License 2.0
3 stars 3 forks source link

Get IDF values for TF-IDF from training data only for XXW classifier (currently uses entire data set as corpus) #1

Closed j6mes closed 7 years ago

andreasvlachos commented 7 years ago

What do you mean? If we are doing it on the training/dev data, we have do it on test, otherwise we shouldn't.

j6mes commented 7 years ago

It was just a mental note. Might need to fix this: the IDF values are computed on the entire dataset (including our hold-out dev set). When we have new documents, should the IDF come from the training set?

On Tue, May 30, 2017 at 8:54 AM, Andreas Vlachos notifications@github.com wrote:

What do you mean? If we are doing it on the training/dev data, we have do it on test, otherwise we shouldn't.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/j6mes/fnc-ensemble/issues/1#issuecomment-304802644, or mute the thread https://github.com/notifications/unsubscribe-auth/AHTV_Q-TA6RQ0pLE6hO9dszF_uUSGcV-ks5r-8sigaJpZM4Npri2 .