ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
402 stars 97 forks source link

problems about preprocessing and test on UBFC dataset #133

Closed GaoXu007 closed 1 year ago

GaoXu007 commented 1 year ago

Dear author, thanks for your dedication, I believe such a valuable tool will promote the development of the rPPG field and allow researchers to more effectively build upon existing work.

I have encountered a few issues while using the tool, and I was hoping to seek your guidance in resolving them.

  1. When I follow the steps in readme.md to preprocess the UBFC dataset, I encountered some warnings/errors, but the program can still continue to run. What caused this? Do these warnings have an impact on the experimental results? preprocess_dataset_problem

  2. Apart from DO_PREPROCESS & DATA_PATH & CACHED_PATH, I followed your default parameters in UBFC_UNSUPERVISED.yaml without making any changes. But the results were not as good as yours (see the figure below). For example, the MAE and RMSE of POS and CHROM in my result: POS: 4.00 7.58
    CHROM: 3.98 8.72 Table2 Am I ignoring some details? Or your parameters are different from mine?

  3. When using pre-trained models PURE_UBFC_TSCAN_BASIC.yaml, the inference/predict process is very slow, lasting up to 2 hours. Is this normal? I ran the program on the server.

  4. I noticed that the framerates of different videos in the UBFC dataset are inconsistent. For example, subject12’s framerate is 29 and subject23’s framerate is 28.9, but in .yaml, the value of the FS parameter is fixed to 30. Does this affect the results?

  5. Each video will calculate an MAE value, and the MAE in the final result is the average of these video MAEs. Am I correct?This may be a naive question...

  6. Is the Frame Resizing operation (cv2.resize) necessary? What is its role? Can I use a raw frame? How are W=72 and H=72 determined?

I appreciate any guidance you can provide. thanks.

yahskapar commented 1 year ago

Hi @GaoXu007,

Thanks for using the toolbox!

  1. That's a strange error that I've never seen before, are you able to reproduce it? It seems that I'm unable to reproduce this. The terminal output I get for using UBFC_UNSUPERVISED.yaml to get MAE, RMSE, MAPE, and Pearson correlation after using POS is below:
Preprocessing dataset...
  5%|██████▏                                                                                                                           | 2/42 [00:38<10:50, 16.26s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
  7%|█████████▎                                                                                                                        | 3/42 [00:38<05:54,  9.08s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
 19%|████████████████████████▊                                                                                                         | 8/42 [00:40<00:47,  1.39s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
 26%|█████████████████████████████████▊                                                                                               | 11/42 [01:12<03:30,  6.80s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
 64%|██████████████████████████████████████████████████████████████████████████████████▉                                              | 27/42 [02:21<02:28,  9.89s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
Warning: More than one faces are detected(Only cropping the biggest one.)
 83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▌                     | 35/42 [02:40<00:28,  4.08s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
Warning: More than one faces are detected(Only cropping the biggest one.)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [02:58<00:00,  4.26s/it]
Total Number of raw files preprocessed: 42

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeRaw_LabelTypeRaw_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180_unsupervised

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/DataFileLists/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeRaw_LabelTypeRaw_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180_unsupervised_0.0_1.0.csv
 unsupervised Preprocessed Dataset Length: 42

===Unsupervised Method ( POS ) Predicting ===
100%|███████████████████████████████████████████| 42/42 [04:10<00:00,  5.96s/it]
Used Unsupervised Method: POS
FFT MAE (FFT Label):3.9969308035714284
FFT RMSE (FFT Label):7.5831059532071405
FFT MAPE (FFT Label):3.8622851481891742
FFT Pearson  (FFT Label):0.9224921893686093

I've only ever seen that error when using ffmpeg for other purposes, so I'm not sure how you managed to see this error within the context of the rPPG-Toolbox. Perhaps OpenCV can also report a similar error in certain scenarios. You may benefit from starting with a fresh copy of this repo, re-running setup.sh, and re-installing the required dependencies as per the README. Please make sure you are also using the latest version of this toolbox by checking the commit history using git log.

Also, can you paste your entire terminal output when doing some step with the toolbox such as pre-processing? I'm a bit confused by the output you have provided since it doesn't seem like you are getting the usual terminal output I would expect when doing pre-processing steps such as face cropping.

  1. Those results look good to me and are different from the table you referenced due to numerous changes to the rPPG-Toolbox (code re-factoring, bug fixes, etc) having been made since that table was generated. That table in the arXiv paper is outdated and will be updated in the coming few months. Here's a picture (for reference purposes only) of our new table from the ongoing draft:

    Screenshot 2023-03-20 at 10 53 21 PM
  2. That doesn't sound normal at all. Please give us more information regarding your computing environment. For example, what kind of GPU do you have?

  3. That shouldn't affect the results significantly.

  4. Assuming you're using the default FFT evaluation method as specified by the YAML config file, two heart values are calculated for each video - one based on the FFT of the ground truth PPG waveform and another based on the FFT of the FFT of the predicted PPG waveform. Then the final MAE is calculated after taking the mean of the absolute values of differences between the predicted HRs and the label HRs. You can refer to evaluation/metrics.py and evaluation/post_process.py in the toolbox for more details, take a look at the calculate_metrics() function in particular.

  5. I believe it is necessary in order to allow for fair comparisons between various baselines in the rPPG-Toolbox. If you wanted to and had enough memory, you could use the raw size by changing those width and height parameters in the YAML config file to a width of 640 and a height of 480 (which is the resolution of UBFC-rPPG videos). My understanding is the main reason it is set to a default of 72x72 is so that it can allow for a fair comparison to various neural methods that rely on a 72x72 pre-processed input, such as TS-CAN.

girishvn commented 1 year ago

Hi @GaoXu007,

I agree with every @yahskapar mentioned. One quick add-on regarding 6)

The use of down-sampling the frames are 3-fold. 1) To allow a fair comparison between networks 2) To allow for an increased number of samples per mini-batches (accounting for GPU/device memory constraints) 3) Spatial downsampling/averaging has historically been used by the rPPG community as a means to filter high-frequency spatial noise, thus improving the SNR of the pulse signal.

One more quick note: If you do decide to use larger frame sizes / raw inputs you will have to change the number of parameters in the fully connected layers of the models you are using, as the FC layer neuron counts are dependent on the input size.

Hope this helps!

GaoXu007 commented 1 year ago

Hi @girishvn @yahskapar, I truly appreciate your willingness to share your knowledge. Thank you taking the time to answer my question.

I re-installing the latest version of this toolbox, and I noticed my experimental results are identical to yours. The results of method POS and TS-CAN are shown in the following two figures.

preprocess_terminal_output

preprocess_pretrained

However, when preprocessing the dataset, there are still warnings (you can see it from the screenshots above). This may seem strange, but it doesn't seem to affect the final result.

@yahskapar asked for my computer environment information, and I posted it here:

CPU

GPU

Besides, I tested the running time of the TS-CAN program, and it took 66 mins(including the preprocessing time), which is still a bit long. Maybe it's because someone else is also using the server? How long does it take you to run this program?

In addition, I have a question. @yahskapar , how do you do this by pasting the terminal output results in a reserved format into the comments? I tried to paste it directly, but the format was in disorder, so I had to use screenshots......how do you do that?

yahskapar commented 1 year ago

Great to hear that the results look as expected now. Regarding the warning you still see, can you navigate to your BaseLoader.py file and change multi_process_quota=8 to multi_process_quota=1 on this line? That should disable multi-processing for our purposes and only use a single process for preprocessing the dataset. Perhaps the warning will disappear then, in which case I think it's indicative that this issue is unique to your computing environment and possibly an issue with CPU resources (especially if there are 24 CPUs, 2 threads per a core, and this is a shared cluster).

I also wonder if the warning you still see has something to do with how you downloaded the UBFC-rPPG data itself - maybe you can try redownloading the first 8 or so folders in your UBFC-rPPG dataset folder to see if the warning persists? It's possible a video in one of the subject folders is somehow corrupt enough to trigger that warning. I recommend trying only the first 8 or so subject folders in your dataset folder since, based on the images you provided, it seems this warning is triggered within the first 8 videos that get pre-processed. Avoiding multi-processing as mentioned before will also help narrow down which video exactly this triggers on. Printing out the saved_filename variable here may help with debugging if you want to know for sure which subject video might be causing issues.

Besides, I tested the running time of the TS-CAN program, and it took 66 mins(including the preprocessing time), which is still a bit long. Maybe it's because someone else is also using the server? How long does it take you to run this program?

How long does it take excluding the preprocessing time? Preprocessing can take a while depending on your computing environment, so perhaps that's where the majority of those 66 minutes are spent. Inference using a pre-trained TS-CAN model should be much faster than 66 minutes, and in general for me on a cluster with 128 CPUs and multiple NVIDIA RTX A4500s and A6000s, takes 4-5 minutes with the default inference batch size of 4. I should emphasize those 4-5 minutes are without factoring in preprocessing.

In addition, I have a question. @yahskapar , how do you do this by pasting the terminal output results in a reserved format into the comments? I tried to paste it directly, but the format was in disorder, so I had to use screenshots......how do you do that?

I wrapped the text copied from the terminal with ``` above the text and ``` below the text. Some more info here.

An example
GaoXu007 commented 1 year ago

hi, @yahskapar you guessed it right! I checked the videos one by one and finally found the damaged one----subject33 (although this video can still be played normally). You can draw this conclusion from the terminal output below.

Preprocessing dataset...
  0%|                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
saved_filename:  subject33
[rawvideo @ 0x4c84e40] Invalid buffer size, packet size 453007 < expected frame_size 921600
Warning: More than one faces are detected(Only cropping the biggest one.)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:30<00:00, 30.05s/it]
Total Number of raw files preprocessed: 1

Cached Data Path /home/gx18/PreprocessedData/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeRaw_LabelTypeRaw_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180_unsupervised

File List Path /home/gx18/PreprocessedData/DataFileLists/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeRaw_LabelTypeRaw_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180_unsupervised_0.0_1.0.csv
 unsupervised Preprocessed Dataset Length: 1

===Unsupervised Method ( POS ) Predicting ===

To solve this problem, I downloaded subject33's video from the link provided by the author of UBFC , but an error was reported after decompressing the zip file. This seems strange, perhaps due to the compression. Finally, I directly downloaded the uncompressed video and it works. When I run the program again, the warning no longer appears. see terminal output below.

Preprocessing dataset...
  0%|                                                                                                                                                                     | 0/42 [00:00<?, ?it/s]saved_filename:  subject33
saved_filename:  subject41
saved_filename:  subject5
saved_filename:  subject48
saved_filename:  subject14
saved_filename:  subject13
saved_filename:  subject4
saved_filename:  subject44
Warning: More than one faces are detected(Only cropping the biggest one.)
  2%|███▋                                                                                                                                                      | 1/42 [01:58<1:21:01, 118.58s/it]saved_filename:  subject9
  5%|███████▍                                                                                                                                                     | 2/42 [02:03<34:25, 51.63s/it]saved_filename:  subject43
  7%|███████████▏                                                                                                                                                 | 3/42 [02:25<24:50, 38.22s/it]saved_filename:  subject23
 10%|██████████████▉                                                                                                                                              | 4/42 [02:42<18:58, 29.96s/it]saved_filename:  subject34
Warning: More than one faces are detected(Only cropping the biggest one.)
 12%|██████████████████▋                                                                                                                                          | 5/42 [03:24<21:01, 34.10s/it]saved_filename:  subject42
 14%|██████████████████████▍                                                                                                                                      | 6/42 [03:44<17:33, 29.27s/it]saved_filename:  subject37
 17%|██████████████████████████▏                                                                                                                                  | 7/42 [03:44<11:37, 19.93s/it]saved_filename:  subject39
Warning: More than one faces are detected(Only cropping the biggest one.)
 19%|█████████████████████████████▉                                                                                                                               | 8/42 [04:17<13:32, 23.90s/it]saved_filename:  subject15
 21%|█████████████████████████████████▋                                                                                                                           | 9/42 [05:45<24:16, 44.12s/it]saved_filename:  subject1
 24%|█████████████████████████████████████▏                                                                                                                      | 10/42 [06:01<18:51, 35.36s/it]saved_filename:  subject22
 26%|████████████████████████████████████████▊                                                                                                                   | 11/42 [06:03<12:55, 25.02s/it]saved_filename:  subject46
 29%|████████████████████████████████████████████▌                                                                                                               | 12/42 [06:48<15:35, 31.20s/it]saved_filename:  subject11
 31%|████████████████████████████████████████████████▎                                                                                                           | 13/42 [07:03<12:45, 26.40s/it]saved_filename:  subject25
Warning: More than one faces are detected(Only cropping the biggest one.)
 33%|████████████████████████████████████████████████████                                                                                                        | 14/42 [07:32<12:40, 27.18s/it]saved_filename:  subject26
Warning: More than one faces are detected(Only cropping the biggest one.)
 36%|███████████████████████████████████████████████████████▋                                                                                                    | 15/42 [07:39<09:29, 21.09s/it]saved_filename:  subject18
 38%|███████████████████████████████████████████████████████████▍                                                                                                | 16/42 [07:50<07:48, 18.03s/it]saved_filename:  subject10
 40%|███████████████████████████████████████████████████████████████▏                                                                                            | 17/42 [08:58<13:44, 32.98s/it]saved_filename:  subject24
 43%|██████████████████████████████████████████████████████████████████▊                                                                                         | 18/42 [10:05<17:13, 43.05s/it]saved_filename:  subject3
saved_filename:  subject36
 48%|██████████████████████████████████████████████████████████████████████████▎                                                                                 | 20/42 [10:05<08:32, 23.28s/it]saved_filename:  subject45
 50%|██████████████████████████████████████████████████████████████████████████████                                                                              | 21/42 [10:24<07:47, 22.24s/it]saved_filename:  subject31
 52%|█████████████████████████████████████████████████████████████████████████████████▋                                                                          | 22/42 [11:14<09:51, 29.58s/it]saved_filename:  subject16
 55%|█████████████████████████████████████████████████████████████████████████████████████▍                                                                      | 23/42 [11:26<07:48, 24.67s/it]saved_filename:  subject12
 57%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                                  | 24/42 [11:32<05:52, 19.56s/it]saved_filename:  subject49
 60%|████████████████████████████████████████████████████████████████████████████████████████████▊                                                               | 25/42 [12:14<07:21, 25.98s/it]saved_filename:  subject40
 62%|████████████████████████████████████████████████████████████████████████████████████████████████▌                                                           | 26/42 [13:13<09:28, 35.53s/it]saved_filename:  subject17
Warning: More than one faces are detected(Only cropping the biggest one.)
 64%|████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                       | 27/42 [13:51<09:03, 36.22s/it]saved_filename:  subject8
 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████                                                    | 28/42 [13:52<05:59, 25.67s/it]saved_filename:  subject30
 69%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                | 29/42 [13:52<03:56, 18.21s/it]saved_filename:  subject20
Warning: More than one faces are detected(Only cropping the biggest one.)
 71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 30/42 [14:27<04:36, 23.08s/it]saved_filename:  subject35
 74%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 31/42 [14:51<04:17, 23.38s/it]saved_filename:  subject32
 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                     | 32/42 [15:13<03:50, 23.04s/it]saved_filename:  subject27
 79%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                 | 33/42 [16:04<04:41, 31.23s/it]saved_filename:  subject47
 81%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                             | 34/42 [16:26<03:48, 28.57s/it]saved_filename:  subject38
 93%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊           | 39/42 [17:37<00:42, 14.11s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [18:39<00:00, 26.66s/it]
Total Number of raw files preprocessed: 42

Cached Data Path /home/gx18/PreprocessedData/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeDiffNormalized_Standardized_LabelTypeDiffNormalized_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180

File List Path /home/gx18/PreprocessedData/DataFileLists/UBFC_SizeW72_SizeH72_ClipLength180_DataTypeDiffNormalized_Standardized_LabelTypeDiffNormalized_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len180_0.0_1.0.csv
 test Preprocessed Dataset Length: 439

===Testing===
Testing uses pretrained model!

You have keen eye on this issue! Thanks.

I tested the inference time of TS-CAN model again, but it still takes a long time(53 mins), see terminal output below.

===Testing===
Testing uses pretrained model!

FFT MAE (FFT Label):1.2974330357142858
FFT RMSE (FFT Label):2.8704957923240366
FFT MAPE (FFT Label):1.500155568072648
FFT Pearson (FFT Label):0.9890524988652464

real    53m13.441s

What are the possible reasons for this? What do you think of this? My machine(shared cluster) should not have such poor performance. Can you give me some hints?

Thank you again for taking the time to help me solve the problem.

yahskapar commented 1 year ago

Good to hear the warning no longer appears after a direct download of the uncompressed subject33 video.

Regarding the inference time alone taking that long, that's pretty strange. I don't personally have the same or similar hardware that you have, so I'm not sure if somehow that inference time is reproducible by anyone else. What's the exact command you're using to run the toolbox, is it something like python main.py --config_file ./configs/train_configs/PURE_PURE_UBFC_TSCAN_BASIC.yaml? Can you try maybe adding CUDA_VISIBLE_DEVICES in front of that command (e.g., CUDA_VISIBLE_DEVICES=0 python main.py --config_file ./configs/train_configs/PURE_PURE_UBFC_TSCAN_BASIC.yaml to make sure the GPU is actually being used properly?

You can also try using a GPU on your cluster that isn't being used by someone else based on nvidia-smi. Also, since your cluster is shared, when you notice these long inference times are there any other heavy computational loads (e.g., on the GPUs, on the CPUs, or with I/O operations possibly) taking place? I'm assuming based on your previous posts, by the way, that you have not modified any of the defaults for data chunking in the YAML config files that you are using.

GaoXu007 commented 1 year ago

hi, @yahskapar

I ran the code according to your suggestion. The command and corresponding nvidia-smi terminal output are as follows:

CUDA_VISIBLE_DEVICES=0 python main.py --config_file./configs/train_configs/PURE_UBFC_TSCAN_BASIC.yaml

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:19:00.0 Off |                  N/A |
| 13%   33C    P8     6W / 257W |   5103MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
| 13%   34C    P8     8W / 257W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:67:00.0 Off |                  N/A |
| 13%   37C    P8     7W / 257W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:68:00.0  On |                  N/A |
| 15%   44C    P8    19W / 257W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2332091      C   python                           5100MiB |
+-----------------------------------------------------------------------------+

Without CUDA_VISIBLE_DEVICES in front of the command, e.g., python main.py --config_file ./configs/infer_configs/PURE_UBFC_TSCAN_BASIC.yaml (Please note that by default, I am using GPU0, while GPU1, GPU2, and GPU3 are being utilized by others. It can be seen from the following results.)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:19:00.0 Off |                  N/A |
| 17%   45C    P2    47W / 257W |   5103MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
| 29%   62C    P2   107W / 257W |   2785MiB / 11264MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:67:00.0 Off |                  N/A |
| 20%   52C    P2    52W / 257W |   1541MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:68:00.0  On |                  N/A |
| 27%   62C    P2   105W / 257W |   2773MiB / 11264MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3329489      C   python                           5100MiB |
|    1   N/A  N/A   3300013      C   xingzai                          2776MiB |
|    2   N/A  N/A   3300014      C   xingzai                          1530MiB |
|    3   N/A  N/A   3300015      C   xingzai                          2766MiB |
+-----------------------------------------------------------------------------+

In order to utilize all 4 GPUs, I modified the value of NUM_OF_GPU_TRAIN in PURE_UBFC_TSCAN_BASIC.yaml from 1 to 4, and my command to run the toolbox(Please note that I am currently utilizing all 4 GPUs ): python main.py --config_file ./configs/infer_configs/PURE_UBFC_TSCAN_BASIC.yaml

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:19:00.0 Off |                  N/A |
| 13%   34C    P8     6W / 257W |   2019MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
| 13%   36C    P8     8W / 257W |   1901MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:67:00.0 Off |                  N/A |
| 13%   38C    P8     6W / 257W |   1901MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:68:00.0  On |                  N/A |
| 16%   43C    P8    18W / 257W |   1901MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2405982      C   python                           2016MiB |
|    1   N/A  N/A   2405982      C   python                           1898MiB |
|    2   N/A  N/A   2405982      C   python                           1898MiB |
|    3   N/A  N/A   2405982      C   python                           1898MiB |
+-----------------------------------------------------------------------------+

Based on the above results, it is evident that the code will utilize GPU0 whether or not CUDA_VISIBLE_DEVICES is added before the command. Furthermore, changing the value of NUM_OF_GPU_TRAIN in the PURE_UBFC_TSCAN_BASIC.yaml file will modify the number of available GPUs accordingly.

Unfortunately, despite using all 4 GPUs, the infer process still takes 53 minutes, which is only slightly faster than using a single GPU, which takes 56 minutes. This raises the question: why is it that multiple GPUs are not leading to significant time savings?

I'm feeling a bit confused about a few of the steps in readme.md,and I'm not entirely sure what the YAML naming conventions should be(the following quote is taken from the readme.md).

For example, if you want to run The model trained on PURE and tested on UBFC, use python main.py --config_file ./configs/infer_configs/PURE_PURE_UBFC_TSCAN_BASIC.yaml

However, I'm unable to locate a file named PURE_PURE_UBFC_TSCAN_BASIC.yaml in the '/infer_configs/' directory. Instead, I only see a file named PURE_UBFC_TSCAN_BASIC.yaml. I'm also noticing that all of the YAML files in '/train_configs' directory follow a naming convention that's similar to PURE_PURE_UBFC_TSCAN_BASIC, with duplicated prefixes. Can you help me understand the meaning behind this naming convention?

I should emphasize that when I ran the toolbox, I use the command like python main.py --config_file ./configs/infer_configs/PURE_UBFC_TSCAN_BASIC.yaml, not "train_configs", not "PURE_PURE_UBFC_TSCAN_BASIC.yaml".

Thank you for taking the time to answer my question. Have a nice day!

yahskapar commented 1 year ago

Hi @GaoXu007,

Everything you posted with respect to nvidia-smi looks sane, so I don't think the problem has to do with the GPU usage itself. I wouldn't necessarily expect utilizing more GPUs to affect the speed of inference in this case (where you are seeing such a long inference time at nearly an hour), as usually that really only helps if your computing environment is constrained in terms of GPU usage (e.g., not enough memory to run some code using a single GPU).

My guess at this point is that there is some other bottleneck in your computing environment. Have you used top or htop to monitor CPU and memory usage on your cluster? A short, basic, and by no means exhaustive introduction to both tools can be found here.

Also, regarding the naming conventions within train_configs and infer_configs, the names effectively correspond to what the config file is meant to do with the noted datasets. For example, PURE_PURE_UBFC_TSCAN_BASIC.yaml basically means that the config is for utilizing the PURE dataset for training and validation, for test on UBFC-rPPG, and with the TS-CAN neural method. You can basically think of the naming convention for the training configs as [TRAINING]_[VALIDATION]_[TEST]_[METHOD]_BASIC.yaml. For the infer configs, it's quite similar, but since it's only inference all you need to be aware of is the dataset the pre-trained model was trained on and the test dataset you want to train said pre-trained model on. For example in the case of PURE_UBFC_TSCAN_BASIC.yaml, it's a config for a pre-trained TS-CAN model that was trained using PURE and will be tested on UBFC-rPPG. We will try to refine the naming conventions and provide better documentation of the naming conventions in the near future, but hopefully my explanation makes sense for now.

yahskapar commented 1 year ago

For reference, I did some testing earlier today using PURE_UBFC_TSCAN_BASIC.yaml and the latest version of the toolbox on a new branch of mine and had the below run-time with DO_PREPROCESS set to False (since I had already pre-processed the PURE dataset):

===Testing===
Testing uses pretrained model!

FFT MAE (FFT Label):1.2974330357142858
FFT RMSE (FFT Label):2.8704957923240366
FFT MAPE (FFT Label):1.500155568072648
FFT Pearson (FFT Label):0.9890524988652464
Completed main.py with PURE_UBFC_TSCAN_BASIC.yaml in 90.4227 seconds

A minute and 30 seconds is a pretty big difference from what you're seeing, so I again recommend you track your CPU and memory usage to see if that may be the source of your issue. I'm also generally curious if you could somehow get similarly long run-times using the UBFC_UNSUPERVISED.yaml config or a config that tests on PURE (e.g., UBFC_PURE_TSCAN_BASIC.yaml) if you have the PURE dataset downloaded,

GaoXu007 commented 1 year ago

Hi @yahskapar , Thank you for your thoughtful answer. I used the htop command to check the cluster and indeed found a problem with excessive CPU usage due to other users.

Consequently, I conducted several experiments on another machine with the same configuration (except for the number of GPUs) and discovered that the time to run PURE_UBFC_TSCAN_BASIC.yaml decreased to 24 minutes(I had already pre-processed dataset). This is much faster than the previous 53 minutes but still significantly slower compared to your 90s.

I guess that the performance difference may be due to variations in the configurations of our computers. Since I didn't download the PURE dataset, I couldn't execute the relevant code for UBFC_PURE_TSCAN_BASIC.yaml, but running UBFC_UNSUPERVISED.yaml also took a considerable amount of time.

Prachiiitd commented 1 year ago

Hi authors I have some realistic videos similar to your UBFC videos however, I don't have the ground truth associated with them. Can I use your toolbox to predict the heat rate/bvp on the videos?

yahskapar commented 1 year ago

@GaoXu007,

Sorry to hear the performance issues persist, can you create a new issue regarding this if you still would like to dive into this a bit more with myself and anyone other toolbox users that are interested? This issue thread is already getting a bit long with many of the original questions answered, so I will go ahead and close it.

@Prachiiitd,

Thanks for using the toolbox. You can use an unsupervised method or a pre-trained model for a supervised method of the toolbox to effectively generate a ground truth label through the predicted PPG, and perhaps storing that label as a .txt file in a similar way as the UBFC-rPPG dataset. This will require some modification of the toolbox on your own fork of the repo, at least until some sort of official support for pseudo-label generation is added.

I believe both @girishvn and @xliucs have generated ground truth labels using the unsupervised POS method in the past, and may be able to give you more advice. Please make a new issue regarding this question to continue this discussion.