ltflores / csc-869-mlog

Automatically exported from code.google.com/p/csc-869-mlog
0 stars 0 forks source link

Fix dependent cross-validation #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
From Orens email:

one quick question, in the 10-fold cross validation, did we make sure
that there are no shared people between two sections? i mean, if we
just divide into 10 sections according to tweets, then we may have
tweet1 and tweet2 of the same congressmanA in two different sections.
in this case, we may get a good result in the cross validation simply
because the classifier can find similarity between tweets of
congressmanA (e.g. if tweet1 is in the verification section, and
tweet2 is in one of the 9 training sections, it may simply learn that
tweet1 and tweet2 are similar in language and we'll get a good
misleading score...).

This needs to be fixed by custom code to distribute.

Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:55

GoogleCodeExporter commented 9 years ago

Original comment by markus.neubrand on 6 Apr 2011 at 12:10