Open jenniew opened 4 years ago
The current python zoo.pipeline.estimator.Estimator and Keras style model also only support one batch size for training data and validation data in training
@jenniew @jason-dai I do not think support different batch sizes alone can solve the ncf notebook evaluation problem.
In DistriOptimizer we will repartition the data rdd into node number partitions, so that we can using zipPartitions
with models to do validation. So even though can set the set the validation dataset's batch, the original records are still randomly distributed in different batches.
I think it might be easier to find a way to implement the metrics (hit ratio or ndcg) without assuming each batch has exactly 1 positive example.
Yes, the order cannot be kept if we coalesce in validation dataset. If we repartition to #nodes partitions, then add negative, is it possible for optimizer to not coalesce if rdd already has #nodes partitions? Currently we cannot do that in optimizer. But I think the batch size needs to be test_neg+1 still each time if we want to follow original validation design.
We can implement metrics to process multiple positive examples, but the metrics result is not same as the original result.
For this case, it is hard to get the same metrics result as old ones.
It seems that we do have a distributed implementation of NDCG and HitRatio (https://github.com/intel-analytics/analytics-zoo/blob/master/zoo/src/main/scala/com/intel/analytics/zoo/models/common/Ranker.scala#L113 and https://github.com/intel-analytics/BigDL/blob/master/spark/dl/src/main/scala/com/intel/analytics/bigdl/optim/ValidationMethod.scala#L883)
HitRatio is the same as hit in NCF case which require one positive and test_neg negative samples in one batch. Zoo NDCG needs to change test label in ncf test dataset.
HitRatio is the same as hit in NCF case which require one positive and test_neg negative samples in one batch. Zoo NDCG needs to change test label in ncf test dataset.
Then how does we guarantee HitRatio work correctly in BigDL?
Seems no validation usage in BigDL/Zoo repo.
maybe we can pack neg+1 element as one element, before validation, use transformer to unpack then validate?
Our current TFDataSet only use same batch size for training data and validation data. But in some cases(i.e. ncf), there is need to set different batch sizes for training and validation. Can we support different batch_size?