That's a good question. In general, we have different aims for training and testing.
During training, we aim to train a model to learn from the input data and learn the underlying patterns and representations within the data distribution. If we normalize the train set, we are artificially altering the data distribution and may introduce biases into the model's training process. This is unwanted.
During testing, we aim to evaluate the models'performance. It is important to normalize the validation set to ensure consistency and fair evaluation.
Hi,
I saw that data is normalized for the
val
split, but not for thetrain
split:https://github.com/ziplab/LITv2/blob/b0a35dec6d7d1244401bf428cccaa82df9ccd813/classification/data/build.py#L134
Shouldn't normalization be applied to both
val
andtrain
splits?Thank you for your help again.