There is some doubt about the model selection using the test set

jinchiniao commented 1 year ago

In general, model selection generally uses the validation set, also called the adaptation set in LRW-ID. Using the test set to select model has the problem of information leakage, and the comparison of performance will be relatively unjust. In train.py:

def validate(v_front, tcn, fast_validate=False, epoch=0, writer=None):
    with torch.no_grad():
        v_front.eval()
        tcn.eval()

        val_data = MultiDataset(
            lrw=args.lrw,
            mode='test',
            max_v_timesteps=args.max_timesteps,
            augmentations=False,
        )
...

The training code here uses the test set to validate the model and complete the selection afterwards In train_udp.py:

def validate(v_front, tcn, udps, fast_validate=False, step=0, writer=None):
    with torch.no_grad():
        v_front.eval()
        tcn.eval()

        val_data = MultiDataset(
            lrw=args.lrw,
            mode='test',
            max_v_timesteps=args.max_timesteps,
            augmentations=False,
            subject=args.subject
        )
...

The fine-tuned code here uses a test set to validate the model and complete the selection afterwards. May I ask if my understanding of your code is somewhat off?

ms-dot-k commented 1 year ago

Thank you for your comments. The uploaded code was the wrong version, and I have uploaded the correct one. Please re-download 'src/data' folder and 'train~/test~.py'. As we wrote in the paper, we use 5 folds validation to report reliable performance, for the 1, 3, and 5 minute adaptation experiments. For the validation, we used the remaining adaptation set, excluding 1, 3, and 5 minute adaptation data. And the best models for each fold are selected and evaluated on the test set. We reported the mean performance of the 5 folds results.

However, for the experiment using different ratios of adaptation data (10~100%), we don't have the validation set to be utilized and we directly validated the model on the test set. In this experiment, the finetuning methods are also selected with the same protocol to be compared with the proposed method.

jinchiniao commented 1 year ago

Thank you for your answers and updates to the code.

ms-dot-k / User-dependent-Padding

There is some doubt about the model selection using the test set #1