tinybike / argus

[DEPRECATED] Automated Q/A prototype
http://pasky.or.cz:5500
9 stars 6 forks source link

Clean up train/test splits #1

Closed pasky closed 8 years ago

pasky commented 8 years ago

We should do a train/test split on input data before processing anything, not just when reporting the results, to ensure that proper data hygiene is kept.

(At a later time, we should also further split the train to trainmodel and val and perform learning of sub-classifiers like the sentiment on trainmodel and measure its performance on val rather than test, so that we don't overfit by parameter tuning. However, we have too little data to afford that at this point, so it's just something to bear in mind for now.)

Silvicek commented 8 years ago

Dealt with long time ago