Using same data (in a new domain) for fine-tuning and testing?

selveszero commented 5 years ago

Is it ok to use the same data for fine tuning and testing?

(Using the same data file for xst and xsu in test.py and s1['test']['path'] and s1['unlab']['path'] in data2.py)

wjko2 commented 5 years ago

It's OK to use the same data for fine tuning and testing.

ZhihuiChen0903 commented 4 years ago

Hi,there is a big gap between the result of using the same data file for xst and xsu in test.py and s1['test']['path'] and s1['unlab']['path'] in data2.py and the result of using different training data and test data. How to treat this question? And which is better?

Looking forward to your reply！tks

JoshuaMathias commented 2 years ago

@ZhihuiChen0903 Using the training data as the test data is generally going to be have much better metric results since the data is exactly the same. Isn't that the reason? So it's not necessarily bad in practice for an application, but for research purposes it's not good since it doesn't robustly tell you whether the model will work well on new data, even similar data. Also you might cause the model to overfit which would make it worse in application as well, though since this is unsupervised training hopefully that's less likely to occur.

wjko2 / Domain-Agnostic-Sentence-Specificity-Prediction

Using same data (in a new domain) for fine-tuning and testing? #9