xliucs / MTTS-CAN

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)
https://proceedings.neurips.cc/paper_files/paper/2020/file/e1228be46de6a0234ac22ded31417bc7-Paper.pdf
MIT License
166 stars 53 forks source link

Values of the diagramm #31

Open MoustafaAlissa opened 1 year ago

MoustafaAlissa commented 1 year ago

Hey guys, we are a group from a university, and we are trying to use your implementation to analyze the face in order to get the pulse and resp diagrams. The problem is that we are getting weird values in the axes of the diagram despite using a video with 30 s length. We would like to know, how to change those values in the code. Thank you very much. Screenshot 2023-05-09 152935

Varun-Tandon14 commented 1 year ago

Hi @MoustafaAlissa, although I am not the author. I'll try to answer if that's okay with you. From the plot above and the information that you have given about using 30 sec. videos. I assume that your video is recorded at about 30 FPS provided that a 30-sec video has almost 900 values (from your plot). The MTTS-CAN model gives a prediction for each frame; hence, you get about 900 values in your plot.

For post-processing, first, follow the steps given in predict_vitals.py.

Additional note about the cutoff freq. of the Butterworth bandpass filter used by the authors:- 1) For heart rate- fl = 0.75 Hz and fh = 2Hz (So the authors assume that the average HR will lie in between [0.75 x60, 2x60 ] BPM or [45,120] BPM 2) For Resp. rate- fl = 0.08 Hz and fh = 0.5Hz (normal resp. rate [ 4.8, 30 ] Breaths per min.

After filtering the signals as given in predict_vitals.py you can follow these steps to get values per min:-

  1. Calculate the power spectral density (can use this) of both the signals
  2. Filter out all PSD for all freq outside your desired range [0.75,2] Hz for HR and so on
  3. Calculate at which freq. does the maximum PSD lie
  4. Multiply the freq obtained by 60 (to convert from Hz to BPM)

Finally, I'm sorry if you have already tried this and asked the question from a different context. I tried explaining to the best of my ability. Others are welcome to correct me. Have fun learning.

MoustafaAlissa commented 1 year ago

Well, this is our code and the problem is, that we are still getting weird values. We have tried to print out the pulse value, but we are not getting understandable numbers like (-0.15).... pls help

import tensorflow as tf
import numpy as np
import scipy.io
import os
import sys
import argparse
sys.path.append('../')
from model import Attention_mask, MTTS_CAN

import matplotlib.pyplot as plt
from scipy.signal import butter
from inference_preprocess import preprocess_raw_video, detrend

def predict_vitals(args):
    img_rows = 36
    img_cols = 36
    frame_depth = 10

    model_checkpoint = 'C:/Users/aliss/OneDrive/Desktop/MTTS-CAN-main/mtts_can.hdf5'
    batch_size = args.batch_size
    fs = args.sampling_rate
    sample_data_path = args.video_path

    dXsub = preprocess_raw_video(sample_data_path, dim=36)
    print('dXsub shape', dXsub.shape)

    dXsub_len = (dXsub.shape[0] // frame_depth) * frame_depth
    dXsub = dXsub[:dXsub_len, :, :, :]

    model = MTTS_CAN(frame_depth, 32, 64, (img_rows, img_cols, 3))
    model.load_weights(model_checkpoint)

    yptest = model.predict((dXsub[:, :, :, :3], dXsub[:, :, :, -3:]), batch_size=batch_size, verbose=2)

    pulse_pred = yptest[0]
    pulse_pred = detrend(np.cumsum(pulse_pred), 100)
    [b_pulse, a_pulse] = butter(1, [0.75 / fs * 2, 2.5 / fs * 2], btype='bandpass')
    pulse_pred = scipy.signal.filtfilt(b_pulse, a_pulse, np.double(pulse_pred))

    avg_pulse_per_seq = np.mean(pulse_pred)
    fps = args.sampling_rate / frame_depth
    seq_per_min = 60 / fps
    avg_pulse_per_min = avg_pulse_per_seq * seq_per_min
    print("Durchschnittlicher Puls pro Minute: {:.2f}".format(avg_pulse_per_min))

    ########## Plot ##################
    plt.subplot(211)
    plt.plot(pulse_pred)
    plt.title('Pulse Prediction')
    plt.show()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--video_path', type=str, help='processed video path')
    parser.add_argument('--sampling_rate', type=int, default=30, help='sampling rate of your video')
    parser.add_argument('--batch_size', type=int, default=100, help='batch size (multiplier of 10)')
    args = parser.parse_args()

    predict_vitals(args)
Varun-Tandon14 commented 1 year ago

Hello, again @MoustafaAlissa. I would strongly advise you and your team to focus on the basics of signal processing. Finding Remote PPG is essentially a signal processing problem so to use amazing solutions like this repo efficiently you should be familiar with some of the concepts from this field. Now coming to the problem to calculate the average HR of the video.

The wrong way:- After detrending and filtering the signal, the mean will lie around zero. Think like you are calculating the mean of a sine wave here that is oscillating around 0. This is why you are getting your weird values ( like -0.15).

The right way:- After filtering the HR signal, the signal you have will be like we added many sine ways of different frequencies together. Now your task is to find the frequency which has maximum power and you will choose this frequency as the HR (in Hz). The steps are the same as I mentioned in my previous comment. I will also add code this time from a very popular toolbox for such a task- rPPG-Toolbox. Have a look at the _calculate_fft_hr function here and you will get your answer.

Have fun learning...