ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
459 stars 112 forks source link

Usage of Data_format parameter (e.g. NDCHW) in BaseLoader.py #293

Closed Ishtiaque-khan97 closed 2 months ago

Ishtiaque-khan97 commented 3 months ago

Hi, Firstly, Thank you for this amazing toolbox. I am trying to use this to run inference on my custom dataset. Could you please explain the function of the Data_format parameter (e.g. NDCHW)?

I see that it is used to reshape the video frame data in the baseloader.py

def __getitem__(self, index):
        """Returns a clip of video(3,T,W,H) and it's corresponding signals(T)."""
        data = np.load(self.inputs[index])
        label = np.load(self.labels[index])

        print("***************Inside the getitem, Load file  : ***************" + self.inputs[index])
        if self.data_format == 'NDCHW':
            data = np.transpose(data, (0, 3, 1, 2))
        elif self.data_format == 'NCDHW':
            data = np.transpose(data, (3, 0, 1, 2))
        elif self.data_format == 'NDHWC':
            pass
        else:
            raise ValueError('Unsupported Data Format!')

Is this function supposed to return data in the format (channels, TIme, Height, WIdth)? However, I saw that all the current data loaders read video files and return numpy arrays of shape (time, height, width, channels) and the format for them in the config files are all "NDHWC". Hence, this if_else clause will be run:

elif self.data_format == 'NDHWC':
                  pass

And so, the data returned by the getitem should be in the format (time, height, width, channels)., am I right?

yahskapar commented 2 months ago

Hi @Ishtiaque-khan97,

Thanks for using the toolbox! I would note the following:

Note that T is basically a temporal dimension, in this case really it corresponds to frames of a video. N effectively corresponds to the batch size, D the frames or temporal dimension, H the height, W the width, and C the channels. The noted representations (e.g., NDCHW) are effectively the batched data representations (e.g., what the neural method trainers see).

Does the DATA_FORMAT parameter make more sense now?

Ishtiaque-khan97 commented 2 months ago

Hi @yahskapar , Thank you so much for the explanation. This does clear it up a bit.

I have just 2 questions.

  1. So, if we wish to use the 2DCNNbased methods for inference/training, we should always set the parameter to "NDCHW" and for 3DCNN methods we should always set to "NCDHW" in the config files, is this correct?

  2. What format is the __getitem__ function returning? Is it supposed to return data in the format(channels, temporal, Height, WIdth)? However, for the unsupervised config files (e.g. UBFC-PHYS_UNSUPERVISED.yaml), it is returning data in the format (time, height, width, channels), right? I am a little confused about this mismatch.

I would greatly appreciate it if you could please help clarify these.

yahskapar commented 2 months ago

Hi @Ishtiaque-khan97,

  1. Assuming you base your corresponding, customized trainer on the existing trainers (e.g., TS-CAN trainer) in this toolbox, yes, that's correct. If you deviate from those trainers significantly, especially with respect to how the data is fed into the network, it's possible that you would have to tweak the data format as well.
  2. In the context of that function, it might make more sense to just ignore the N completely since the data isn't batched yet. I would propose maybe changing this in the code itself, but I worry that the reason it was originally done this way was to make related code elsewhere (e.g., in the trainer files) more understandable. Perhaps a descriptive comment can be added at the least. I think your current understanding is correct - it's returning the data in the specified format effectively, where the ordering may vary depending on the specified format as I mentioned before. In the unsupervised case, it's really returning (D,H,W,C) (corresponds to how images are loaded by default in NumPy with H,W,C I believe). I think perhaps the comment regarding (3,T,W,H) is a bit confusing, since realistically it corresponds to (C,D,W,H) which shouldn't actually be returned based on my understanding.

I'll try to verify this comment-related discrepancy a bit later when I have time and correct it to avoid future confusion.

Ishtiaque-khan97 commented 2 months ago

@yahskapar I understand now. Thank you for the detailed response. We can close this issue now if you want.

yahskapar commented 2 months ago

Great - feel free to make a new issue in case you have any other questions or have any concerns.