zhaodongsun / contrast-phys

[TPAMI & ECCV 2022] Contrast-Phys & Contrast-Phys+ for facial video-based remote physiological signal measurement
https://ieeexplore.ieee.org/document/10440521
MIT License
67 stars 12 forks source link

Why Does the Model Performance Deteriorate Over Time? #12

Closed BugMaker2002 closed 5 months ago

BugMaker2002 commented 5 months ago

A week ago, I trained a model that worked well on the test set, I saved the corresponding weights, but now when I used the weights again to inference (the test set did not change), I found that the effect was poor, why?

In addition, when I ran test.py a week ago, I saved the predicted and true values of each subject in a.npy file, and now when I evaluate r,mae,rmse directly on this.npy file, the result is normal (very good). But now when I re-predict with the model weights that I used at that time, and find r,mae,rmse, the result becomes very bad

Moreover, I re-downloaded the code of the official website, and started training and testing again, and found that the effect was still very poor.

I wonder, what is going on here?

zhaodongsun commented 5 months ago

The problem could happen in the post-processing or the inputs. Since you use the same weights, the model should be the same in your two tests. Please check your input frames whether they are the same in your two tests. Please also check your post-processing is the same like filtering and heart rate calculation.

BugMaker2002 commented 5 months ago

I set train_exp_num=2 here becausetrain_exp_num=2 is the train I run a week ago with normal results, so the test set I used was also divided for that run. The model weight I used was the 29th epoch of that training, which is also the epoch used in that test. As in the following code, the results are terrible and completely irrational, as shown in the following figure: (I can guarantee that my code for r,mae,rmse has not changed)

ex = Experiment('model_pred', save_git_info=False)

@ex.config
def my_config():
    e = 29 # the model checkpoint at epoch e
    train_exp_num = 2 # the training experiment number
    train_exp_dir = './results/%d'%train_exp_num # training experiment directory

    # 这里为了适应transformer架构,我将时间从30s改成了10s
    time_interval = 30 # get rppg for 30s video clips, too long clips might cause out of memory

    ex.observers.append(FileStorageObserver(train_exp_dir))

    if torch.cuda.is_available():
        device = torch.device('cuda')
        torch.backends.cudnn.enabled = True
        torch.backends.cudnn.benchmark = True

    else:
        device = torch.device('cpu')

@ex.automain
def my_main(_run, e, train_exp_dir, device, time_interval):

    mae_loss_func = nn.L1Loss().to(device)
    mse_loss_func = nn.MSELoss().to(device)

    # load test file paths
    test_list = list(np.load(train_exp_dir + '/test_list.npy'))
    pred_exp_dir = train_exp_dir + '/%d'%(int(_run._id)) # prediction experiment directory

    with open(train_exp_dir+'/config.json') as f:
        config_train = json.load(f)

    model = PhysNet(config_train['S'], config_train['in_ch']).to(device).eval()

    model.load_state_dict(torch.load(train_exp_dir+'/epoch%d.pt'%(e), map_location=device)) # load weights to the model

    @torch.no_grad()
    def dl_model(imgs_clip):
        # model inference
        img_batch = imgs_clip
        img_batch = img_batch.transpose((3,0,1,2))
        # 在img_batch前面新增了一个批量大小的维度(批量大小为1)
        img_batch = img_batch[np.newaxis].astype('float32')
        img_batch = torch.tensor(img_batch).to(device)

        rppg = model(img_batch)[:,-1, :] # (1, 5, T) -> (1, T)
        rppg = rppg[0].detach().cpu().numpy()
        return rppg

    for h5_path in test_list:
        h5_path = str(h5_path)

        with h5py.File(h5_path, 'r') as f:
            imgs = f['imgs']
            subject_name = os.path.basename(h5_path)[:-3]
            bvp_path = f"/share2/data/zhouwenqing/UBFC_rPPG/dataset2/{subject_name}/ground_truth.txt"
            bvp = np.loadtxt(bvp_path).reshape((-1, 1))
            # bvppeak = f['bvp_peak']
            fs = config_train['fs']

            # duration表示秒数,fs表示frame per seccond 
            duration = np.min([imgs.shape[0], bvp.shape[0]]) / fs
            num_blocks = int(duration // time_interval)
            # 从整个视频当中截取出num_blocks个视频片段,这些片段之间是连续的(指从原视频当中截取的方式)
            rppg_list = []
            bvp_list = []
            # bvppeak_list = []
            for b in range(num_blocks):
                rppg_clip = dl_model(imgs[b*time_interval*fs:(b+1)*time_interval*fs])
                rppg_list.append(rppg_clip)

                bvp_list.append(bvp[b*time_interval*fs:(b+1)*time_interval*fs])
                # bvppeak_list.append(bvppeak[b*time_interval*fs:(b+1)*time_interval*fs])

            rppg_list = np.array(rppg_list)
            bvp_list = np.array(bvp_list)
            # bvppeak_list = np.array(bvppeak_list)
            # results = {'rppg_list': rppg_list, 'bvp_list': bvp_list, 'bvppeak_list':bvppeak_list}
            results = {'rppg_list': rppg_list, 'bvp_list': bvp_list}
            np.save(pred_exp_dir+'/'+h5_path.split('/')[-1][:-3], results)

            bvp_list = bvp_list.reshape(num_blocks, -1)
            hr_pred = torch.tensor(rppg_list)
            hr_gt = torch.tensor(bvp_list)

            mae_all = mae_loss_func(hr_pred, hr_gt)
            mse_all = mse_loss_func(hr_pred, hr_gt)
            rmse = np.sqrt(mse_all)
            correlation_coefficients = np.corrcoef(rppg_list, bvp_list)[0, 3]
            print("Evaluation Result\n MAE: {:.4f}; RMSE: {:.4f}; R: {:.4f};".format(
                mae_all, rmse, correlation_coefficients))

image In addition, I tested using thesubject.npy file under the directory results/2/5, which I saved a week ago when I ran the test.py file for the predicted and true values of each subject. I evaluated r,mae,rmse, etc. for each subject.npy and found that the results are normal, so what is the problem? image

zhaodongsun commented 5 months ago

Hi, it seems for each video you get a rmse, mae, and r. This calculation is not correct. You should get gt_hr and pred_hr from all video clips (30s), and then calculate rmse ,mae and r. Also, you calculate the person correlation between the bvp waveform and rppg waveform. The correct Pearson correlation should be between gt_hr and pred_hr.