ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
442 stars 106 forks source link

PBV Error #243

Closed koutsd closed 1 month ago

koutsd commented 7 months ago

Hi, I am trying to run the PBV Unsupervised Method and I am getting the following error: image

yahskapar commented 6 months ago

Hi @koutsd,

I just quickly tried PBV, as well as POS as a quick sanity check, with the UBFC-rPPG dataset and got the below results without any errors:

===Unsupervised Method ( PBV ) Predicting ===                                                                    
100%|███████████████████████████████████████████| 42/42 [00:51<00:00,  1.24s/it]
Used Unsupervised Method: PBV                                                                                    
FFT MAE (FFT Label): 15.904017857142858 +/- 3.2501993000580853    
FFT RMSE (FFT Label): 26.393506674643252 +/- 199.7121314290944                                                                                                                                                                    
FFT MAPE (FFT Label): 15.16923203548406 +/- 2.911677776157863
FFT Pearson (FFT Label): 0.47901723809312485 +/- 0.1387932352106574                                                                                                                                                               
FFT SNR (FFT Label): -9.127975065225185 +/- 1.336235665783653 (dB)
Saved PBV_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_
face_boxFalse_unsupervised/bland_altman_plots.                                                                   
Saved PBV_UBFC-rPPG_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Medi
an_face_boxFalse_unsupervised/bland_altman_plots.
===Unsupervised Method ( POS ) Predicting ===
100%|███████████████████████████████████████████| 42/42 [01:04<00:00,  1.53s/it]q
Used Unsupervised Method: POS
FFT MAE (FFT Label): 3.9969308035714284 +/- 0.994365555673655
FFT RMSE (FFT Label): 7.5831059532071405 +/- 21.649350686623574
FFT MAPE (FFT Label): 3.8622851481891742 +/- 0.9014916191719023
FFT Pearson (FFT Label): 0.9224921893686093 +/- 0.06103444940234774
FFT SNR (FFT Label): -2.3875030222395335 +/- 1.1356165444835469 (dB)
Saved POS_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_unsupervised/bland_altman_plots.
Saved POS_UBFC-rPPG_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_unsupervised/bland_altman_plots.

Can you share more details on what exactly you're doing? Specifically...

1) What dataset are you trying to use?

2) What is your (I'm assuming custom) inference config?

Based on your progress bar in the image you attached, I'm guessing you either are using a custom dataset where there may be some data loading issue or a custom config with chunking enabled. It would be good to have more details so that I can try to reproduce the error. Generally speaking, my guess there is there is some issue with data_input itself before it's passed on to PBV() in line 39 of unsupervised_predictor.py. Maybe you can try printing it out to see if it's a matrix of zeroes or something else that looks completely off. If that's the issue, there is likely some sort of problem on the data loading side of things, maybe even with your custom dataloader if you happen to be using one.

koutsd commented 6 months ago

Hi @yahskapar, I am trying to use a segment of the MR-NIRP dataset. Here is the dataloader: https://github.com/koutsd/rPPG-Toolbox/blob/main/dataset/data_loader/MRNIRPLoader.py

And the custom inference config which is basically identical to the existing ones for unsupervised methods: https://github.com/koutsd/rPPG-Toolbox/blob/main/configs/infer_configs/MR-NIRP_UNSUPERVISED.yaml

Also seeing your results, in POS and CHROM I am getting nan Pearson and 0 SNR while MAE, RMSE and MAPE seem to be normal. Running inference on some of the pretrained models seems to work fine as well.

yahskapar commented 6 months ago

Hi @koutsd,

I can't seem to access either of your links (possibly because your fork is private). Can you change your fork's settings so that it's publicly viewable, or maybe just share code snippets here directly?

Your note regarding results being off for POS and CHROM definitely sound strange. Two more questions:

1) When you say "running inference on some of the pretrained models seems to work fine as well", do you mean using those models to test on MR-NIRP? If you don't mean that, my guess is this is an issue isolated to the MR-NIRP dataset (whether the data loading step or something else potentially).

2) After preprocessing the MR-NIRP dataset, have you tried visualizing the preprocessed data based on the info in the toolbox's README? I'm curious regarding how the preprocessed data looks.

koutsd commented 6 months ago

@yahskapar the fork should be public. But anyway, this is the code for the data loader: ` import glob import os import re

from tqdm import tqdm import numpy as np

import zipfile

import cv2

from scipy.io import loadmat

from dataset.data_loader.BaseLoader import BaseLoader

class MRNIRPLoader(BaseLoader): """The data loader for the MR-NIRP Processed dataset.""" def init(self, name, data_path, config_data): """Initializes an MR-NIRP dataloader. Args: data_path(str): path of a folder which stores raw video and bvp data. e.g. data_path should be "RawData" for below dataset structure:

                 RawData/
                 |   |-- subject1/
                 |       |-- NIR.zip
                 |       |-- RGB.zip
                 |       |-- PulseOX.zip
                 |   |-- subject2/
                 |       |-- NIR.zip
                 |       |-- RGB.zip
                 |       |-- PulseOX.zip
                 |...
                 |   |-- subjectn/
                 |       |-- NIR.zip
                 |       |-- RGB.zip
                 |       |-- PulseOX.zip
            -----------------
            name(string): name of the dataloader.
            config_data(CfgNode): data settings(ref:config.py).
    """
    super().__init__(name, data_path, config_data)

def get_raw_data(self, data_path):
    """Returns data directories under the path(For UBFC-rPPG dataset)."""
    data_dirs = glob.glob(data_path + os.sep + "subject*" + os.sep + "*_garage_small_motion_975")
    if not data_dirs:
        raise ValueError("dataset data paths empty!")
    dirs = [{"index": os.path.basename(data_dir), "path": data_dir} for data_dir in data_dirs]
    return dirs

def split_raw_data(self, data_dirs, begin, end):
    """Returns a subset of data dirs, split with begin and end values."""
    if begin == 0 and end == 1:  # return the full directory if begin == 0 and end == 1
        return data_dirs

    file_num = len(data_dirs)
    choose_range = range(int(begin * file_num), int(end * file_num))
    data_dirs_new = []

    for i in choose_range:
        data_dirs_new.append(data_dirs[i])

    return data_dirs_new

@staticmethod
def read_video(video_file, resize_dim=144):
    """Reads a video file, returns frames(T, H, W, 3) """
    cnt = 0
    frames=list()
    with zipfile.ZipFile(video_file, "r") as zippedImgs:
        for ele in zippedImgs.namelist():
            ext = os.path.splitext(ele)[-1]

            if ext == '.pgm':
                data = zippedImgs.read(ele)
                frame = cv2.imdecode(np.frombuffer(data, np.uint8), cv2.IMREAD_COLOR)
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

                # downsample frames (otherwise processing time becomes WAY TOO LONG)
                if resize_dim is not None:
                    dim_w = min(resize_dim, frame.shape[1])
                    dim_h = int(dim_w * frame.shape[0] / frame.shape[1])
                    frame = cv2.resize(frame, (dim_w, dim_h), interpolation=cv2.INTER_AREA)
                    frame = np.expand_dims(frame, axis=0)

                if cnt == 0:
                    frames = frame
                else:
                    frames = np.concatenate((frames, frame), axis=0)
                cnt += 1

    if cnt == 0:
        raise ValueError('EMPTY VIDEO', video_file)

    return np.asarray(frames)

@staticmethod
def read_wave(wave_file):
    """Reads a bvp signal file."""
    with zipfile.ZipFile(wave_file, 'r') as wave_archive:
        mat = loadmat(wave_archive.open('PulseOX/pulseOx.mat'))
        ppg = mat['pulseOxRecord']

    return np.asarray(ppg).flatten()

def preprocess_dataset(self, data_dirs, config_preprocess, begin=0, end=1):
    """Preprocesses the raw data."""
    file_num = len(data_dirs)
    for i in tqdm(range(file_num)):
        # Read Video Frames
        frames = self.read_video(os.path.join(data_dirs[i]['path'], "RGB.zip"))

        # Read Labels
        if config_preprocess.USE_PSUEDO_PPG_LABEL:
            bvps = self.generate_pos_psuedo_labels(frames, fs=self.config_data.FS)
        else:
            bvps = self.read_wave(os.path.join(data_dirs[i]['path'], "PulseOx.zip"))

        target_length = frames.shape[0]
        bvps = BaseLoader.resample_ppg(bvps, target_length)
        frames_clips, bvps_clips = self.preprocess(frames, bvps, config_preprocess)
        self.preprocessed_data_len += self.save(frames_clips, bvps_clips, data_dirs[i]["index"])

`

  1. Yes that is what i mean. I am using the pretrained models to predict rPPG on the MR-NIRP videos.
  2. I have not tried that to be honest. I will take a look at it now.
yahskapar commented 6 months ago

Ah sorry, the previous links worked but I guess somehow the hyperlink is directed elsewhere. When I copy/pasted them I was able to see your fork.

Take a look at visualizing the preprocessed data and let me know how that looks, that could be an indicator of what might be going wrong here. Two more things from my side:

1) I saw that you are using chunking in your config - have you tried not using chunking to see if this issue disappears? If that's not an option, maybe you can try increasing the chunk size (start with doubling it). It's possible that using chunking or the chunk size in particular is causing issues with the generated Q prior to W = np.linalg.solve(Q, np.swapaxes(pbv, 0, 1)). Specifically, you might need more data frames to generate a reasonable Q that is non-singular.

2) Do your other results (e.g., aside from PBV) look reasonable? Can you copy/paste a few examples of those metrics? I wonder if it's possible that the downsampling you do in the dataloader is problematic. Depending on your other results and if they look particularly bad, it makes sense to me that downsampling could reduce important signal variations to the point where some algorithms aren't feasible.

koutsd commented 6 months ago

Same error when i do not use chunking. The value of Q seems to be :

[[[3621.1177 3621.1177 3621.1177] [3621.1177 3621.1177 3621.1177] [3621.1177 3621.1177 3621.1177]]]

which makes sense that it cannot calculate the inverse.

One thing i forgot is that the same error appeared with ICA but changing the np.linalg.inv to pinv seems to make it work fine with reasonable results.

Visualizing the preprocessed data I use for PBV seems to indicate that there is an issue with preprocessing. image

However visualizing the preprocessed data that I used for inference on DeepPhys as an example doesn't appear to have the same issue (Here is the inference config: https://github.com/koutsd/rPPG-Toolbox/blob/main/configs/infer_configs/MR-NIRP_DEEPPHYS_BASIC.yaml)

image

Here are some other results i got: ICA: FFT MAE (FFT Label): 12.156188605108055 +/- 0.6237049901814224 FFT RMSE (FFT Label): 18.595105539484255 +/- 30.586266386799505 FFT MAPE (FFT Label): 17.753297782767326 +/- 0.9811269806704203 FFT Pearson (FFT Label): 0.10270500445779261 +/- 0.04417670457312846 FFT SNR (FFT Label): -7.338134895442314 +/- 0.5621326375759251 (dB)

POS: FFT MAE (FFT Label): 14.034872298624755 +/- 0.5867216015896881 FFT RMSE (FFT Label): 19.29240626424388 +/- 25.6623413310728 FFT MAPE (FFT Label): 17.340880032432093 +/- 0.6320355299885947 FFT Pearson (FFT Label): nan +/- nan FFT SNR (FFT Label): nan +/- nan (dB)

CHROM: FFT MAE (FFT Label): 14.034872298624755 +/- 0.5867216015896881 FFT RMSE (FFT Label): 19.29240626424388 +/- 25.6623413310728 FFT MAPE (FFT Label): 17.340880032432093 +/- 0.6320355299885947 FFT Pearson (FFT Label): nan +/- nan FFT SNR (FFT Label): 0.0 +/- 0.0 (dB)

GREEN: FFT MAE (FFT Label): 11.879911591355599 +/- 0.5917352544239702 FFT RMSE (FFT Label): 17.870617241284066 +/- 27.93667231103747 FFT MAPE (FFT Label): 17.069884928431097 +/- 0.9053708450075868 FFT Pearson (FFT Label): 0.1239876159018555 +/- 0.04406886915953198 FFT SNR (FFT Label): -7.034770138632807 +/- 0.555316366028882 (dB)

LGI: FFT MAE (FFT Label): 11.879911591355599 +/- 0.5917352544239702 FFT RMSE (FFT Label): 17.870617241284066 +/- 27.93667231103747 FFT MAPE (FFT Label): 17.069884928431097 +/- 0.9053708450075868 FFT Pearson (FFT Label): 0.1239876159018555 +/- 0.04406886915953198 FFT SNR (FFT Label): -7.034770138632807 +/- 0.555316366028882 (dB)

Pretrained model from PURE_DeepPhys.pth: FFT MAE (FFT Label): 7.320205479452055 +/- 0.8215448725667628 FFT RMSE (FFT Label): 10.141781629929417 +/- 21.973822756598874 FFT MAPE (FFT Label): 10.690965433149405 +/- 1.2611468660258334 FFT Pearson (FFT Label): 0.6254225147458197 +/- 0.09260294447064851 FFT SNR (FFT Label): -6.139698874487016 +/- 0.9795437171407604 (dB)

koutsd commented 6 months ago

UPDATE:

I believe I resolved the issue. The video frames in the dataset are stored as 10bit raw RGB images that have to be demosaiced to obtain the RGB channels. Also for some reason the frames were stored in .pgm file format which is used for grayscale images. As a result imread was reading the images as greyscale. Strangely the inference results did not get affected but PBV does not crash anymore.

Here is the updated read_video method: `

def read_video(video_file):
    frames = list()
    all_pgm = sorted(glob.glob(os.path.join(video_file, "Frame*.pgm")))
    for pgm_path in all_pgm:
        frame = cv2.imread(pgm_path, cv2.IMREAD_UNCHANGED)          # read 10bit raw image
        frame = cv2.cvtColor(frame, cv2.COLOR_BAYER_BG2RGB)         # Demosaice RGB Image
        frame = cv2.convertScaleAbs(frame, alpha=(255.0/65535.0))   # convert from uint16 to uint8

        frames.append(frame)

    return np.asarray(frames, dtype=np.uint8)

`

yahskapar commented 6 months ago

Good to hear that the error is gone at least. How does the newly preprocessed data visualization look? I should note your results seem reasonable depending on what portion of the MR-NIRP dataset you're evaluating on:

Screenshot from 2024-02-09 15-53-36

Also, I planned on including MR-NIRP some while ago but ended up getting too busy. Would you be interested in making a pull request to add the dataset (e.g., data loader, etc) to this toolbox and effectively contribute to it? I'm happy to take a look at the PR + suggest or make some modifications, as well as test with the dataset after re-downloading it.

koutsd commented 6 months ago

Visualizing the preprocessed data where the frames are diffnormalized and standarized looks like this: image

However when the data type is set to raw the visualization still looks like this: image I'm wondering whether its a bug with the visualization notebook since they seem to be loaded correctly elsewhere.

As for the results, I was evaluating on minimal head motion - garage, so there seems to be a large amount of error. I am not so sure if i am setting the preprocessing correctly or there is still something wrong with the dataloader.

Also, I would be happy to make a PR to add the dataset to the toolbox.

yahskapar commented 6 months ago

The visualization you shared with the difference frame looks more reasonable. As for the visualization of raw data, there could be a bug - I'll try and look into that sometime in a few weeks when I have more time.

Thanks for putting up the PR (#244) - I appreciate it! I won't have time to take a thorough look at it and test with MR-NIRP myself for a few weeks, but I will ping you when I do take a look.

koutsd commented 6 months ago

Thanks a lot for the help.

As for the visualization of the raw data I believe i found the bug. In getitem of the BaseLoader the loaded data are cast into float32: image

When the preprocess method is called the data need to be of type uint8 (0..255) because its necessary for the face detection. After crop_face_resize if the data are to be Standardized or DiffNormalized they are later turned into float32 (0..1) but if they are raw they are are saved as uint8.
image

Normalizing the raw data after crop_face_resize seems to fix the issue. image

Here is how the raw data look in the visualization notebook after that: image