Open pity2003 opened 6 years ago
It's correct (and a flaw of the current implementation) that in the default implementation the validation is not optimal. It's possible that this might lead to overfitting or at least to an biased estimation of the net performance. I'm not sure if there is a way to change that such that the implementation remain backward compatible
My solution is to add a parameter to the "call" method of "BaseDataProvider". This parameter is used to decide as to whether or not the current loading is for validation or training/prediction. If the loading is for validation, remove the corresponding files from "self.data_files" and reset "self.file_idx = -1". After this, loading any mini-batch will only deal with the remaining files. In this case, however, the validation set has to be loaded before the training data.
Maybe there is the better solution than mine.
Thanks.
I was thinking about an approach similiar to Keras where one has to provide two data_provider, one für the training and one for validation. This would allow for a clean separation
I noticed that both the validation data and mini-batch training data are selected from the same set of training images. In this context, there may be overlapping between the validation and training set. Will this lead to over-fitting?
Thanks.