Closed koutsd closed 4 months ago
Hi @koutsd,
I just quickly tried PBV, as well as POS as a quick sanity check, with the UBFC-rPPG dataset and got the below results without any errors:
===Unsupervised Method ( PBV ) Predicting ===
100%|███████████████████████████████████████████| 42/42 [00:51<00:00, 1.24s/it]
Used Unsupervised Method: PBV
FFT MAE (FFT Label): 15.904017857142858 +/- 3.2501993000580853
FFT RMSE (FFT Label): 26.393506674643252 +/- 199.7121314290944
FFT MAPE (FFT Label): 15.16923203548406 +/- 2.911677776157863
FFT Pearson (FFT Label): 0.47901723809312485 +/- 0.1387932352106574
FFT SNR (FFT Label): -9.127975065225185 +/- 1.336235665783653 (dB)
Saved PBV_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_
face_boxFalse_unsupervised/bland_altman_plots.
Saved PBV_UBFC-rPPG_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Medi
an_face_boxFalse_unsupervised/bland_altman_plots.
===Unsupervised Method ( POS ) Predicting ===
100%|███████████████████████████████████████████| 42/42 [01:04<00:00, 1.53s/it]q
Used Unsupervised Method: POS
FFT MAE (FFT Label): 3.9969308035714284 +/- 0.994365555673655
FFT RMSE (FFT Label): 7.5831059532071405 +/- 21.649350686623574
FFT MAPE (FFT Label): 3.8622851481891742 +/- 0.9014916191719023
FFT Pearson (FFT Label): 0.9224921893686093 +/- 0.06103444940234774
FFT SNR (FFT Label): -2.3875030222395335 +/- 1.1356165444835469 (dB)
Saved POS_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_unsupervised/bland_altman_plots.
Saved POS_UBFC-rPPG_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW72_SizeH72_ClipLength180_DataTypeRaw_DataAugNone_LabelTypeRaw_Crop_faceTrue_BackendHC_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_unsupervised/bland_altman_plots.
Can you share more details on what exactly you're doing? Specifically...
1) What dataset are you trying to use?
2) What is your (I'm assuming custom) inference config?
Based on your progress bar in the image you attached, I'm guessing you either are using a custom dataset where there may be some data loading issue or a custom config with chunking enabled. It would be good to have more details so that I can try to reproduce the error. Generally speaking, my guess there is there is some issue with data_input
itself before it's passed on to PBV()
in line 39 of unsupervised_predictor.py
. Maybe you can try printing it out to see if it's a matrix of zeroes or something else that looks completely off. If that's the issue, there is likely some sort of problem on the data loading side of things, maybe even with your custom dataloader if you happen to be using one.
Hi @yahskapar, I am trying to use a segment of the MR-NIRP dataset. Here is the dataloader: https://github.com/koutsd/rPPG-Toolbox/blob/main/dataset/data_loader/MRNIRPLoader.py
And the custom inference config which is basically identical to the existing ones for unsupervised methods: https://github.com/koutsd/rPPG-Toolbox/blob/main/configs/infer_configs/MR-NIRP_UNSUPERVISED.yaml
Also seeing your results, in POS and CHROM I am getting nan Pearson and 0 SNR while MAE, RMSE and MAPE seem to be normal. Running inference on some of the pretrained models seems to work fine as well.
Hi @koutsd,
I can't seem to access either of your links (possibly because your fork is private). Can you change your fork's settings so that it's publicly viewable, or maybe just share code snippets here directly?
Your note regarding results being off for POS and CHROM definitely sound strange. Two more questions:
1) When you say "running inference on some of the pretrained models seems to work fine as well", do you mean using those models to test on MR-NIRP? If you don't mean that, my guess is this is an issue isolated to the MR-NIRP dataset (whether the data loading step or something else potentially).
2) After preprocessing the MR-NIRP dataset, have you tried visualizing the preprocessed data based on the info in the toolbox's README? I'm curious regarding how the preprocessed data looks.
@yahskapar the fork should be public. But anyway, this is the code for the data loader: ` import glob import os import re
from tqdm import tqdm import numpy as np
import zipfile
import cv2
from scipy.io import loadmat
from dataset.data_loader.BaseLoader import BaseLoader
RawData/
| |-- subject1/
| |-- NIR.zip
| |-- RGB.zip
| |-- PulseOX.zip
| |-- subject2/
| |-- NIR.zip
| |-- RGB.zip
| |-- PulseOX.zip
|...
| |-- subjectn/
| |-- NIR.zip
| |-- RGB.zip
| |-- PulseOX.zip
-----------------
name(string): name of the dataloader.
config_data(CfgNode): data settings(ref:config.py).
"""
super().__init__(name, data_path, config_data)
def get_raw_data(self, data_path):
"""Returns data directories under the path(For UBFC-rPPG dataset)."""
data_dirs = glob.glob(data_path + os.sep + "subject*" + os.sep + "*_garage_small_motion_975")
if not data_dirs:
raise ValueError("dataset data paths empty!")
dirs = [{"index": os.path.basename(data_dir), "path": data_dir} for data_dir in data_dirs]
return dirs
def split_raw_data(self, data_dirs, begin, end):
"""Returns a subset of data dirs, split with begin and end values."""
if begin == 0 and end == 1: # return the full directory if begin == 0 and end == 1
return data_dirs
file_num = len(data_dirs)
choose_range = range(int(begin * file_num), int(end * file_num))
data_dirs_new = []
for i in choose_range:
data_dirs_new.append(data_dirs[i])
return data_dirs_new
@staticmethod
def read_video(video_file, resize_dim=144):
"""Reads a video file, returns frames(T, H, W, 3) """
cnt = 0
frames=list()
with zipfile.ZipFile(video_file, "r") as zippedImgs:
for ele in zippedImgs.namelist():
ext = os.path.splitext(ele)[-1]
if ext == '.pgm':
data = zippedImgs.read(ele)
frame = cv2.imdecode(np.frombuffer(data, np.uint8), cv2.IMREAD_COLOR)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# downsample frames (otherwise processing time becomes WAY TOO LONG)
if resize_dim is not None:
dim_w = min(resize_dim, frame.shape[1])
dim_h = int(dim_w * frame.shape[0] / frame.shape[1])
frame = cv2.resize(frame, (dim_w, dim_h), interpolation=cv2.INTER_AREA)
frame = np.expand_dims(frame, axis=0)
if cnt == 0:
frames = frame
else:
frames = np.concatenate((frames, frame), axis=0)
cnt += 1
if cnt == 0:
raise ValueError('EMPTY VIDEO', video_file)
return np.asarray(frames)
@staticmethod
def read_wave(wave_file):
"""Reads a bvp signal file."""
with zipfile.ZipFile(wave_file, 'r') as wave_archive:
mat = loadmat(wave_archive.open('PulseOX/pulseOx.mat'))
ppg = mat['pulseOxRecord']
return np.asarray(ppg).flatten()
def preprocess_dataset(self, data_dirs, config_preprocess, begin=0, end=1):
"""Preprocesses the raw data."""
file_num = len(data_dirs)
for i in tqdm(range(file_num)):
# Read Video Frames
frames = self.read_video(os.path.join(data_dirs[i]['path'], "RGB.zip"))
# Read Labels
if config_preprocess.USE_PSUEDO_PPG_LABEL:
bvps = self.generate_pos_psuedo_labels(frames, fs=self.config_data.FS)
else:
bvps = self.read_wave(os.path.join(data_dirs[i]['path'], "PulseOx.zip"))
target_length = frames.shape[0]
bvps = BaseLoader.resample_ppg(bvps, target_length)
frames_clips, bvps_clips = self.preprocess(frames, bvps, config_preprocess)
self.preprocessed_data_len += self.save(frames_clips, bvps_clips, data_dirs[i]["index"])
`
Ah sorry, the previous links worked but I guess somehow the hyperlink is directed elsewhere. When I copy/pasted them I was able to see your fork.
Take a look at visualizing the preprocessed data and let me know how that looks, that could be an indicator of what might be going wrong here. Two more things from my side:
1) I saw that you are using chunking in your config - have you tried not using chunking to see if this issue disappears? If that's not an option, maybe you can try increasing the chunk size (start with doubling it). It's possible that using chunking or the chunk size in particular is causing issues with the generated Q
prior to W = np.linalg.solve(Q, np.swapaxes(pbv, 0, 1))
. Specifically, you might need more data frames to generate a reasonable Q that is non-singular.
2) Do your other results (e.g., aside from PBV) look reasonable? Can you copy/paste a few examples of those metrics? I wonder if it's possible that the downsampling you do in the dataloader is problematic. Depending on your other results and if they look particularly bad, it makes sense to me that downsampling could reduce important signal variations to the point where some algorithms aren't feasible.
Same error when i do not use chunking. The value of Q seems to be :
[[[3621.1177 3621.1177 3621.1177] [3621.1177 3621.1177 3621.1177] [3621.1177 3621.1177 3621.1177]]]
which makes sense that it cannot calculate the inverse.
One thing i forgot is that the same error appeared with ICA but changing the np.linalg.inv to pinv seems to make it work fine with reasonable results.
Visualizing the preprocessed data I use for PBV seems to indicate that there is an issue with preprocessing.
However visualizing the preprocessed data that I used for inference on DeepPhys as an example doesn't appear to have the same issue (Here is the inference config: https://github.com/koutsd/rPPG-Toolbox/blob/main/configs/infer_configs/MR-NIRP_DEEPPHYS_BASIC.yaml)
Here are some other results i got: ICA: FFT MAE (FFT Label): 12.156188605108055 +/- 0.6237049901814224 FFT RMSE (FFT Label): 18.595105539484255 +/- 30.586266386799505 FFT MAPE (FFT Label): 17.753297782767326 +/- 0.9811269806704203 FFT Pearson (FFT Label): 0.10270500445779261 +/- 0.04417670457312846 FFT SNR (FFT Label): -7.338134895442314 +/- 0.5621326375759251 (dB)
POS: FFT MAE (FFT Label): 14.034872298624755 +/- 0.5867216015896881 FFT RMSE (FFT Label): 19.29240626424388 +/- 25.6623413310728 FFT MAPE (FFT Label): 17.340880032432093 +/- 0.6320355299885947 FFT Pearson (FFT Label): nan +/- nan FFT SNR (FFT Label): nan +/- nan (dB)
CHROM: FFT MAE (FFT Label): 14.034872298624755 +/- 0.5867216015896881 FFT RMSE (FFT Label): 19.29240626424388 +/- 25.6623413310728 FFT MAPE (FFT Label): 17.340880032432093 +/- 0.6320355299885947 FFT Pearson (FFT Label): nan +/- nan FFT SNR (FFT Label): 0.0 +/- 0.0 (dB)
GREEN: FFT MAE (FFT Label): 11.879911591355599 +/- 0.5917352544239702 FFT RMSE (FFT Label): 17.870617241284066 +/- 27.93667231103747 FFT MAPE (FFT Label): 17.069884928431097 +/- 0.9053708450075868 FFT Pearson (FFT Label): 0.1239876159018555 +/- 0.04406886915953198 FFT SNR (FFT Label): -7.034770138632807 +/- 0.555316366028882 (dB)
LGI: FFT MAE (FFT Label): 11.879911591355599 +/- 0.5917352544239702 FFT RMSE (FFT Label): 17.870617241284066 +/- 27.93667231103747 FFT MAPE (FFT Label): 17.069884928431097 +/- 0.9053708450075868 FFT Pearson (FFT Label): 0.1239876159018555 +/- 0.04406886915953198 FFT SNR (FFT Label): -7.034770138632807 +/- 0.555316366028882 (dB)
Pretrained model from PURE_DeepPhys.pth: FFT MAE (FFT Label): 7.320205479452055 +/- 0.8215448725667628 FFT RMSE (FFT Label): 10.141781629929417 +/- 21.973822756598874 FFT MAPE (FFT Label): 10.690965433149405 +/- 1.2611468660258334 FFT Pearson (FFT Label): 0.6254225147458197 +/- 0.09260294447064851 FFT SNR (FFT Label): -6.139698874487016 +/- 0.9795437171407604 (dB)
UPDATE:
I believe I resolved the issue. The video frames in the dataset are stored as 10bit raw RGB images that have to be demosaiced to obtain the RGB channels. Also for some reason the frames were stored in .pgm file format which is used for grayscale images. As a result imread was reading the images as greyscale. Strangely the inference results did not get affected but PBV does not crash anymore.
Here is the updated read_video method: `
def read_video(video_file):
frames = list()
all_pgm = sorted(glob.glob(os.path.join(video_file, "Frame*.pgm")))
for pgm_path in all_pgm:
frame = cv2.imread(pgm_path, cv2.IMREAD_UNCHANGED) # read 10bit raw image
frame = cv2.cvtColor(frame, cv2.COLOR_BAYER_BG2RGB) # Demosaice RGB Image
frame = cv2.convertScaleAbs(frame, alpha=(255.0/65535.0)) # convert from uint16 to uint8
frames.append(frame)
return np.asarray(frames, dtype=np.uint8)
`
Good to hear that the error is gone at least. How does the newly preprocessed data visualization look? I should note your results seem reasonable depending on what portion of the MR-NIRP dataset you're evaluating on:
Also, I planned on including MR-NIRP some while ago but ended up getting too busy. Would you be interested in making a pull request to add the dataset (e.g., data loader, etc) to this toolbox and effectively contribute to it? I'm happy to take a look at the PR + suggest or make some modifications, as well as test with the dataset after re-downloading it.
Visualizing the preprocessed data where the frames are diffnormalized and standarized looks like this:
However when the data type is set to raw the visualization still looks like this: I'm wondering whether its a bug with the visualization notebook since they seem to be loaded correctly elsewhere.
As for the results, I was evaluating on minimal head motion - garage, so there seems to be a large amount of error. I am not so sure if i am setting the preprocessing correctly or there is still something wrong with the dataloader.
Also, I would be happy to make a PR to add the dataset to the toolbox.
The visualization you shared with the difference frame looks more reasonable. As for the visualization of raw data, there could be a bug - I'll try and look into that sometime in a few weeks when I have more time.
Thanks for putting up the PR (#244) - I appreciate it! I won't have time to take a thorough look at it and test with MR-NIRP myself for a few weeks, but I will ping you when I do take a look.
Thanks a lot for the help.
As for the visualization of the raw data I believe i found the bug. In getitem of the BaseLoader the loaded data are cast into float32:
When the preprocess method is called the data need to be of type uint8 (0..255) because its necessary for the face detection. After crop_face_resize if the data are to be Standardized or DiffNormalized they are later turned into float32 (0..1) but if they are raw they are are saved as uint8.
Normalizing the raw data after crop_face_resize seems to fix the issue.
Here is how the raw data look in the visualization notebook after that:
Hi, I am trying to run the PBV Unsupervised Method and I am getting the following error: