plemeri / InSPyReNet

Official PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (ACCV 2022)
MIT License
449 stars 69 forks source link

Unable to reproduce models and Increasing validation loss #34

Closed malinamanolache closed 1 year ago

malinamanolache commented 1 year ago

Hello and thank you for the great work.

While working with this project I came across a few problems and I hope you could give me some suggestions.

1. Unable to reproduce models

Firstly I tried reproducing one of the LR+HR trainings, InSPyReNet_SwinB_HU (HRSOD-TR and UHRSD-TR), but I do not obtain the same results. I gathered the results in the following table:

Dataset Model Sm mae adpEm maxEm avgEm adpFm maxFm avgFm wFm mBA
DUTS-TE yours 0.939 0.0221 0.931 0.9657 0.951 0.865 0.936 0.908 0.901 0.735
DUTS-TE mine 0.882 0.0396 0.897 0.909 0.889 0.799 0.847 0.8185 0.799 0.6437
HRSOD-TE yours 0.9565 0.0173 0.9527 0.9746 0.9641 0.9090 0.9564 0.933 0.9234 0.7714
HRSOD-TE mine 0.9136 0.0322 0.9023 0.9370 0.9199 0.815 0.8934 0.8579 0.8304 0.6412
UHRSD-TE yours 0.9528 0.02038 0.9223 0.9708 0.9617 0.9029 0.9576 0.9431 0.9331 0.7897
UHRSD-TE mine 0.9202 0.0332 0.9133 0.9477 0.9316 0.8615 0.9179 0.8967 0.8713 0.6621

Although the metrics are quite close, the quality of the predictions with the model trained by me are far more inferior than the provided model. I also tried training the PlusUltraHR model and I am experiencing the same thing. Why could this happen? Why can I not reproduce the model?

2. Increasing loss during validation

Additionally, I added validation to the training script in order to monitor the model's performance during training:


for epoch in epoch_iter:

        if args.local_rank <= 0 and args.verbose is True:
            step_iter = tqdm.tqdm(enumerate(train_loader, start=1), desc='Iter', total=len(
                train_loader), position=1, leave=False, bar_format='{desc:<5.5}{percentage:3.0f}%|{bar:40}{r_bar}')
            if args.device_num > 1 and train_sampler is not None:
                train_sampler.set_epoch(epoch)
        else:
            step_iter = enumerate(train_loader, start=1)

        train_loss = []

        for i, sample in step_iter:
            optimizer.zero_grad()
            if opt.Train.Optimizer.mixed_precision is True and scaler is not None:
                with autocast():
                    sample = to_cuda(sample)
                    out = model(sample)

                scaler.scale(out['loss']).backward()
                scaler.step(optimizer)
                scaler.update()
                scheduler.step()
            else:
                sample = to_cuda(sample)
                out = model(sample)
                out['loss'].backward()
                optimizer.step()
                scheduler.step()

            if args.local_rank <= 0 and args.verbose is True:
                step_iter.set_postfix({'loss': out['loss'].item()})

            train_loss.append(out['loss'].item())

        average_loss = np.mean(train_loss)

        step_iter_test = enumerate(test_loader, start=1)

        # model.eval()
        df = df.append({'epoch': epoch, 'scope': 'train', 'set': 'all', 'metric': 'loss', 'value': average_loss}, ignore_index=True)
        writer.add_scalar('Train/loss', average_loss, epoch)

        mse_sum = {}
        loss_sum = {}
        count = {}

        with torch.no_grad():
            for i, sample in step_iter_test:
                sample = to_cuda(sample) #ads 50 MB to GPU memory
                set_name = sample['set'][0]
                out = model(sample)
                loss = out['loss'].detach().cpu().numpy()
                pred = to_numpy(out['pred'], sample['shape'])
                gt = to_numpy(out['gt'], sample['shape'])
                mse = compute_mse(predict=pred, alpha=gt)

                if set_name not in mse_sum:
                    mse_sum[set_name] = 0.0
                    loss_sum[set_name] = 0.0
                    count[set_name] = 0
                mse_sum[set_name] += mse
                loss_sum[set_name] += loss
                count[set_name] += 1

            for set_name in mse_sum:
                mean_mse = mse_sum[set_name] / count[set_name]
                mean_loss = loss_sum[set_name] / count[set_name]
                df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'mse', 'value': mean_mse}, ignore_index=True)
                df = df.append({'epoch': epoch, 'scope': 'valid', 'set': set_name, 'metric': 'loss', 'value': mean_loss}, ignore_index=True)
                writer.add_scalar('Valid/' + set_name + '/mse', mean_mse, epoch)
                writer.add_scalar('Valid/' + set_name + '/loss', mean_loss, epoch)

        df_path = os.path.join(opt.Train.Checkpoint.checkpoint_dir, f'{log_id}.json')
        df.to_json(df_path, orient='records')

        model.train()

InSPyReNet_SwinB_HU training & validation

For InSPyReNet_SwinB_HU training, the validation set I used is DUTS-TE. The training loss is constantly decreasing but the validation loss starts increasing after some epochs:

image image

My assumptions were the following: the model is overfitting or the data distribution between train sets and test set is too different.

Overfitting Check

To check if overfitting is a problem I trained a LR model (using Plus_Ultra_LR config) on 43K samples ('MSRA-10K','HRSOD-TR','HRSOD-TE','ECSSD','HKU-IS','PASCAL-S','DAVIS','UHRSD-TR','UHRSD-TE','FSS-1000','DIS5K') and validated the model after each epoch on 300 images from DUTS-TE. I chose to train a LR model and only a subset of DUTS-TE for faster training. The loss during validation is still increasing:

image

I know that overfitting occurs when the training set has a small number of samples or the model is complex. After this experiment with 43K images I doubt that overfitting is responsible for the increase in validation loss.

Difference in data distribution Check

I was also thinking that the difference of data distribution between training sets might be too big and the model struggles in finding the optimum to accommodate all cases, making it hard to generalize. To test this, I decided to train a LR model on UHRSD2K-TR only for 150 epochs and validate the model on several testing sets:

image image image

I was expecting the loss to decrease for UHRSD2K-TE and increase for HRSOD-TE and PASCAL-S but the validation loss increases for all tests. Along the mentioned experiments, I have trained InspyreNet with different configurations and datasets and for each one the loss during validation increases. What can be the problem? Why is the validation loss always increasing?

plemeri commented 1 year ago

Did you solve the problem? For your information, we trained our model for high resolution configuration with 8 Titan RTX GPUs with 1 sample for each device which is total 8 samples for the batch size. Although we also tried using single GPU but we did not have any trouble like above.

Also, I keep noticing that some people have training issue which I cannot reproduce the problem myself. I cannot be sure but the problem might come from the CUDA version, pytorch version, accidentally using half precision, dataset problem, or others.

I would like to ask you to not modify a single script of our repository and try to reproduce the basic model first to make sure that our code is working fine on your machine. The configuration that you may find is InSPyReNet_SwinB.yaml. Please do not change any code and just use DUTS-TR for training. Then, evaluate on other benchmarks including UHRSD-TE. If you can reproduce our results, then you might changed something causing the problem.

I also would like to mention that I did not trained our model many times to produce the best result for the paper. I just trained once and tested on various GPU servers and verified that our method consistently produced almost identical results, so if you solve the problem above, I can guarantee that you will get the results that you've expected, so don't give up on your project and I'll be your help as much as I can.