Closed Dylan-H-Wang closed 11 months ago
Hi @Dylan-H-Wang,
I'm unable to test this myself at the moment, just to clarify I understand the bug you noticed since the issue title and your post seem a bit confusing together: you are saying that when pre-processing a dataset for the first time, if someone forgets to set DO_PREPROCESS
to True
, the appropriate error is thrown for the MMPD dataset but not the UBFC-rPPG dataset? Or that even if you have the MMPD dataset preprocessed, the error is still shown?
Once you clarify I can look into this a bit further, though I'm not sure if the fixes you applied would work since it may cause a similar issue to pop up for other datasets which rely on that index
key.
Here is my MMPD
data preprocessing YAML file:
# use default train/val split with ratio 8/2
BASE: ['']
TRAIN:
DATA:
INFO:
LIGHT: [1, 2, 3] # 1 - LED-Low, 2 - LED-high, 3 - Incandescent, 4 - Nature
MOTION: [1] # 1 - Stationary, 2 - Rotation, 3 - Talking, 4 - Walking
EXERCISE: [2] # 1 - True, 2 - False
SKIN_COLOR: [3] # Fitzpatrick Scale Skin Types - 3, 4, 5, 6
GENDER: [1, 2] # 1 - Male, 2 - Female
GLASSER: [1, 2] # 1 - True, 2 - False
HAIR_COVER: [1, 2] # 1 - True, 2 - False
MAKEUP: [1, 2] # 1 - True, 2 - False
FS: 30
DATASET: MMPD
DO_PREPROCESS: False # if first time, should be true
DATA_FORMAT: NDCHW
DATA_PATH: "./data/MMPD/raw_data" # Raw dataset path, need to be updated
CACHED_PATH: "./data/MMPD/preprocessed_data" # Processed dataset save path, need to be updated
FILE_LIST_PATH: "./data/MMPD/preprocessed_data/data_file_list" # Path to store file lists, needs to be updated
EXP_DATA_NAME: ""
BEGIN: 0.0
END: 0.8
PREPROCESS:
DATA_TYPE: ['Standardized' ]
LABEL_TYPE: Standardized
DO_CHUNK: True
CHUNK_LENGTH: 180
CROP_FACE:
DO_CROP_FACE: True
USE_LARGE_FACE_BOX: True
LARGE_BOX_COEF: 1.5
DETECTION:
DO_DYNAMIC_DETECTION: False
DYNAMIC_DETECTION_FREQUENCY : 30
USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used
RESIZE:
H: 72
W: 72
VALID:
DATA:
INFO:
LIGHT: [1, 2, 3] # 1 - LED-Low, 2 - LED-high, 3 - Incandescent, 4 - Nature
MOTION: [1] # 1 - Stationary, 2 - Rotation, 3 - Talking, 4 - Walking
EXERCISE: [2] # 1 - True, 2 - False
SKIN_COLOR: [3] # Fitzpatrick Scale Skin Types - 3, 4, 5, 6
GENDER: [1, 2] # 1 - Male, 2 - Female
GLASSER: [1, 2] # 1 - True, 2 - False
HAIR_COVER: [1, 2] # 1 - True, 2 - False
MAKEUP: [1, 2] # 1 - True, 2 - False
FS: 30
DATASET: MMPD
DO_PREPROCESS: False # if first time, should be true
DATA_FORMAT: NDCHW
DATA_PATH: "./data/MMPD/raw_data" # Raw dataset path, need to be updated
CACHED_PATH: "./data/MMPD/preprocessed_data" # Processed dataset save path, need to be updated
FILE_LIST_PATH: "./data/MMPD/preprocessed_data/data_file_list" # Path to store file lists, needs to be updated
EXP_DATA_NAME: ""
BEGIN: 0.8
END: 1.0
PREPROCESS:
DATA_TYPE: [ 'Standardized' ]
LABEL_TYPE: Standardized
DO_CHUNK: True
CHUNK_LENGTH: 180
CROP_FACE:
DO_CROP_FACE: True
USE_LARGE_FACE_BOX: True
LARGE_BOX_COEF: 1.5
DETECTION:
DO_DYNAMIC_DETECTION: False
DYNAMIC_DETECTION_FREQUENCY : 30
USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used
RESIZE:
H: 72
W: 72
TEST:
USE_LAST_EPOCH: False
DATA:
INFO:
LIGHT: [1, 2, 3] # 1 - LED-Low, 2 - LED-high, 3 - Incandescent, 4 - Nature
MOTION: [1] # 1 - Stationary, 2 - Rotation, 3 - Talking, 4 - Walking
EXERCISE: [2] # 1 - True, 2 - False
SKIN_COLOR: [3] # Fitzpatrick Scale Skin Types - 3, 4, 5, 6
GENDER: [1, 2] # 1 - Male, 2 - Female
GLASSER: [1, 2] # 1 - True, 2 - False
HAIR_COVER: [1, 2] # 1 - True, 2 - False
MAKEUP: [1, 2] # 1 - True, 2 - False
FS: 30
DATASET: MMPD
DO_PREPROCESS: True # if first time, should be true
DATA_FORMAT: NDCHW
DATA_PATH: "./data/MMPD/raw_data" # Raw dataset path, need to be updated
CACHED_PATH: "./data/MMPD/preprocessed_data" # Processed dataset save path, need to be updated
FILE_LIST_PATH: "./data/MMPD/preprocessed_data/data_file_list" # Path to store file lists, needs to be updated
EXP_DATA_NAME: ""
BEGIN: 0.0
END: 1.0
PREPROCESS:
DATA_TYPE: ['Standardized' ]
LABEL_TYPE: Standardized
DO_CHUNK: True
CHUNK_LENGTH: 180
CROP_FACE:
DO_CROP_FACE: True
USE_LARGE_FACE_BOX: True
LARGE_BOX_COEF: 1.5
DETECTION:
DO_DYNAMIC_DETECTION: False
DYNAMIC_DETECTION_FREQUENCY : 30
USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used
RESIZE:
H: 72
W: 72
and I modified code a bit that test_loader
will be run first and generate test_file_list
, and then train_loader
and val_loder
will be run sequentially so that it will preprocess the whole dataset first and generate train_file_list
and val_file_list
later without needing to preprocessing the data again.
The bug happens after it preprocessed the whole dataset and generated the test_file_list
. Since I specified the train and val data do not need DO_PREPROCESS
, it will call build_file_list_retroactive()
(the preprocessed file has been generated in the previous step), and then the mentioned bug will happen.
Can you try reproducing this bug without the changes you mentioned you made to test_loader
, or elsewhere? I can't seem to reproduce this bug on my end, and I'm curious if something you might've changed locally might be causing it or if maybe there is some other small difference I'm not noticing.
Also, I'm a bit confused on two more things:
1) What is the config file you mentioned in your last reply supposed to be for? It doesn't seem like it's for proper intra-dataset training since it wouldn't make sense to use 80% for training, 20% for validation of the same data, and then just test on 100% of the same data. If that's supposed to be an intra-dataset config file, you should train on 60%, validate on 20%, and test on 20%, or just train on 80% and test on 20% without any kind of validation split if you end up not using validation.
2) Are you sure you have the latest version of the repo and still had to make the changes you mentioned you made? I'm especially confused since my understanding was that, if you did a proper intra-dataset split like 60%, 20%, 20%, no data would have to be preprocessed for a second time when preprocessing either of the latter two splits. If you wanted, you could even preprocess 100% of the MMPD data in the train split with certain filtering parameters (e.g., skin tones), and then use 100% for the subsequent splits with different filtering parameters (if for some reason you were alright with having the same subjects between splits, but different lighting conditions).
Generally I recommend using splits where subjects will be unique to each split and, for MMPD in particular, you can filter in such a way that you only sub-select certain for certain videos (e.g., with certain lighting, skin tone corresponding to the subject, etc).
file_list
generation without the need to process the video data again. It is strange that you cannot reproduce the bug, cuz I thought it should happen as long as build_file_list_retroactive()
is called for MMPD. Anyway, I copy my data preprocessing code here which was modified based on main.py
, since I want to separate data-preprocessing from training pipeline. You can run this by python preprocess_dataset.py --config_file CONFIG_NAME.yaml
. The config file is the one I pasted in the previous reply.
"""
preprocess_dataset.py
The main function of rPPG dataset preprocessing.
"""
import os
import sys
import random
import argparse
import torch
import numpy as np
from config import get_config
from dataset import data_loader
RANDOM_SEED = 100
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
# Create a general generator for use with the validation dataloader,
# the test dataloader, and the unsupervised dataloader
general_generator = torch.Generator()
general_generator.manual_seed(RANDOM_SEED)
# Create a training generator to isolate the train dataloader from
# other dataloaders and better control non-deterministic behavior
train_generator = torch.Generator()
train_generator.manual_seed(RANDOM_SEED)
def seed_worker(worker_id):
worker_seed = torch.initial_seed() % 2**32
np.random.seed(worker_seed)
random.seed(worker_seed)
def add_args(parser):
"""Adds arguments for parser."""
parser.add_argument(
"--config_file",
required=False,
default="configs/train_configs/PURE_PURE_UBFC-rPPG_TSCAN_BASIC.yaml",
type=str,
help="The name of the model.",
)
"""Neural Method Sample YAML LIST:
SCAMPS_SCAMPS_UBFC-rPPG_TSCAN_BASIC.yaml
SCAMPS_SCAMPS_UBFC-rPPG_DEEPPHYS_BASIC.yaml
SCAMPS_SCAMPS_UBFC-rPPG_PHYSNET_BASIC.yaml
SCAMPS_SCAMPS_PURE_DEEPPHYS_BASIC.yaml
SCAMPS_SCAMPS_PURE_TSCAN_BASIC.yaml
SCAMPS_SCAMPS_PURE_PHYSNET_BASIC.yaml
PURE_PURE_UBFC-rPPG_TSCAN_BASIC.yaml
PURE_PURE_UBFC-rPPG_DEEPPHYS_BASIC.yaml
PURE_PURE_UBFC-rPPG_PHYSNET_BASIC.yaml
PURE_PURE_MMPD_TSCAN_BASIC.yaml
UBFC-rPPG_UBFC-rPPG_PURE_TSCAN_BASIC.yaml
UBFC-rPPG_UBFC-rPPG_PURE_DEEPPHYS_BASIC.yaml
UBFC-rPPG_UBFC-rPPG_PURE_PHYSNET_BASIC.yaml
MMPD_MMPD_UBFC-rPPG_TSCAN_BASIC.yaml
Unsupervised Method Sample YAML LIST:
PURE_UNSUPERVISED.yaml
UBFC-rPPG_UNSUPERVISED.yaml
"""
return parser
if __name__ == "__main__":
# parse arguments.
parser = argparse.ArgumentParser()
parser = add_args(parser)
parser = data_loader.BaseLoader.BaseLoader.add_data_loader_args(parser)
args = parser.parse_args()
# configurations.
config = get_config(args)
print("Configuration:")
print(config, end="\n\n")
# test_loader
if config.TEST.DATA.DATASET == "UBFC-rPPG":
test_loader = data_loader.UBFCrPPGLoader.UBFCrPPGLoader
elif config.TEST.DATA.DATASET == "PURE":
test_loader = data_loader.PURELoader.PURELoader
elif config.TEST.DATA.DATASET == "SCAMPS":
test_loader = data_loader.SCAMPSLoader.SCAMPSLoader
elif config.TEST.DATA.DATASET == "MMPD":
test_loader = data_loader.MMPDLoader.MMPDLoader
elif config.TEST.DATA.DATASET == "BP4DPlus":
test_loader = data_loader.BP4DPlusLoader.BP4DPlusLoader
elif config.TEST.DATA.DATASET == "BP4DPlusBigSmall":
test_loader = data_loader.BP4DPlusBigSmallLoader.BP4DPlusBigSmallLoader
elif config.TEST.DATA.DATASET == "UBFC-PHYS":
test_loader = data_loader.UBFCPHYSLoader.UBFCPHYSLoader
elif config.TEST.DATA.DATASET is None:
print("Test dataset prerpocessing is skipped.", end="\n\n")
else:
raise ValueError(
"Unsupported dataset! Currently supporting UBFC-rPPG, PURE, MMPD, \
SCAMPS, BP4D+ (Normal and BigSmall preprocessing), and UBFC-PHYS."
)
# Create and initialize the test dataloader.
test_data = test_loader(
name="test", data_path=config.TEST.DATA.DATA_PATH, config_data=config.TEST.DATA
)
# train_loader
if config.TRAIN.DATA.DATASET == "UBFC-rPPG":
train_loader = data_loader.UBFCrPPGLoader.UBFCrPPGLoader
elif config.TRAIN.DATA.DATASET == "PURE":
train_loader = data_loader.PURELoader.PURELoader
elif config.TRAIN.DATA.DATASET == "SCAMPS":
train_loader = data_loader.SCAMPSLoader.SCAMPSLoader
elif config.TRAIN.DATA.DATASET == "MMPD":
train_loader = data_loader.MMPDLoader.MMPDLoader
elif config.TRAIN.DATA.DATASET == "BP4DPlus":
train_loader = data_loader.BP4DPlusLoader.BP4DPlusLoader
elif config.TRAIN.DATA.DATASET == "BP4DPlusBigSmall":
train_loader = data_loader.BP4DPlusBigSmallLoader.BP4DPlusBigSmallLoader
elif config.TRAIN.DATA.DATASET == "UBFC-PHYS":
train_loader = data_loader.UBFCPHYSLoader.UBFCPHYSLoader
elif config.TRAIN.DATA.DATASET is None:
print("Train dataset prerpocessing is skipped.", end="\n\n")
else:
raise ValueError(
"Unsupported dataset! Currently supporting UBFC-rPPG, PURE, MMPD, \
SCAMPS, BP4D+ (Normal and BigSmall preprocessing), and UBFC-PHYS."
)
# Create and initialize the train dataloader
train_data_loader = train_loader(
name="train", data_path=config.TRAIN.DATA.DATA_PATH, config_data=config.TRAIN.DATA
)
# valid_loader
if config.VALID.DATA.DATASET == "UBFC-rPPG":
valid_loader = data_loader.UBFCrPPGLoader.UBFCrPPGLoader
elif config.VALID.DATA.DATASET == "PURE":
valid_loader = data_loader.PURELoader.PURELoader
elif config.VALID.DATA.DATASET == "SCAMPS":
valid_loader = data_loader.SCAMPSLoader.SCAMPSLoader
elif config.VALID.DATA.DATASET == "MMPD":
valid_loader = data_loader.MMPDLoader.MMPDLoader
elif config.VALID.DATA.DATASET == "BP4DPlus":
valid_loader = data_loader.BP4DPlusLoader.BP4DPlusLoader
elif config.VALID.DATA.DATASET == "BP4DPlusBigSmall":
valid_loader = data_loader.BP4DPlusBigSmallLoader.BP4DPlusBigSmallLoader
elif config.VALID.DATA.DATASET == "UBFC-PHYS":
valid_loader = data_loader.UBFCPHYSLoader.UBFCPHYSLoader
elif config.VALID.DATA.DATASET is None:
print("Validation dataset prerpocessing is skipped.", end="\n\n")
else:
raise ValueError(
"Unsupported dataset! Currently supporting UBFC-rPPG, PURE, MMPD, \
SCAMPS, BP4D+ (Normal and BigSmall preprocessing), and UBFC-PHYS."
)
# Create and initialize the valid dataloader.
valid_data = valid_loader(
name="valid", data_path=config.VALID.DATA.DATA_PATH, config_data=config.VALID.DATA
)
Hi @Dylan-H-Wang,
I think I understand what you're saying now, but I should stress this seems more like a feature request than a bug per say. The base functionality you're describing as a bug actually works as intended - per preprocessing run, no existing preprocessed data would be utilized and data will always be preprocessed (as if for the first time) alongside a new data file list being generated. If there isn't an explicit data file list generated from such a preprocessing run, you have to either re-preprocess the data or just create and/or modify a data file list.
For your purposes, if you wanted, you could preprocess 100% of the dataset and then create the data file lists yourself using the instructions in this section of the README. We can of course make this easier to do in the future by automating the actual data file list generation process while noting data that has already been preprocessed, if the user specifies that.
I'm going to go ahead and close this issue while noting this as a possible feature in the future, but let me know if I somehow still misunderstood the problem at hand.
Hi,
I found the program will throw an error
ValueError: ('train', 'File list empty. Check preprocessed data folder exists and is not empty.')
when the config file set theDO_PREPROCESS: False
forMMPD
but not forUBFC-rPPG
.I located the bug happened at https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/dataset/data_loader/BaseLoader.py#L487
Since in https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/dataset/data_loader/MMPDLoader.py#L56 the dict key
index
is used for the video index of each subject instead of subject index. In theUBFC-rPPGLoader
, theindex
is defined bydirs = [{"index": re.search('subject(\d+)', data_dir).group(0), "path": data_dir} for data_dir in data_dirs]
which would include stringsubject
, but this is not the case forMMPDLoader
which I think cause this bug.I hard-coded the block to solve this issue by changing two lines:
commented lines are the original codes.