需要单独划分一部分验证集出来吗

zhengye1995 / Tianchi-2019-Guangdong-Intelligent-identification-of-cloth-defects-rank5

天池2019广东工业智造创新大赛布匹疵点检测天池水也太深了季军解决方案

403 stars 142 forks source link

需要单独划分一部分验证集出来吗 #8

Closed NextGuido closed 4 years ago

NextGuido commented 4 years ago

您好，非常感谢您的代码。我看了一下您的配置文件cascade_rcnn_r50_fpn_70e.py中验证集设定如下： val=dict( type=dataset_type, ann_file=data_root + 'annotations/instances_val2017.json', img_prefix=data_root + 'val2017/', pipeline=test_pipeline), 这个好像是默认的voc数据的路径，而不是布匹缺陷检测的数据。所以，您并没有设定单独的验证我集，我这样理解对吗？

zhengye1995 commented 4 years ago

Just because it is the final submit code which aim to get best score, so I use all data to train, you can split a val data by your self. And I think a better train val split can help you to improve your algorithm.

NextGuido commented 4 years ago

@zhengye1995 非常感谢您的回答。我想问一下，如果没有验证集的话，训练到70个epoch的话，会不会产生过拟合啊？我不是很了解这个，谢谢啦

zhengye1995 commented 4 years ago

70 epoch is just because I do not change the anchor ratio, so the big aspect ratio bbox will convergence slowly. If add anchor ratio like 0.1 10,etc. 12 epochs or 20 epochs just like 1x or 2x for coco is enough. Under normal situation, 1x or 2x is ok.

NextGuido commented 4 years ago

@zhengye1995 非常感谢．最后一个问题，如何断定70个epochs是效果最好，是同时训练了好几个，然后根据实验判定70e效果最佳？

zhengye1995 commented 4 years ago

Maybe the 70 epoch is not the best, I just train 80 epoch, and I save checkpoint pre 5 epoch. I submit 20, 40, 60, 80 and get scores and I found the score order is 60>80>40>20, so I choose 70epoch. If you has a good val dataset, you can use your val dataset to eval the score, but it is always difficult to get a val dataset with similar score to online test dataset, so sometimes we just use the online test dataset to select epoch.

NextGuido commented 4 years ago

@zhengye1995 非常感谢，真的对我很有帮助