Closed Ishtiaque-khan97 closed 4 months ago
Hi @Ishtiaque-khan97,
Thanks for using the toolbox! I would note the following:
NDCHW
is for all 2DCNN-based neural methods (e.g., TS-CAN, DeepPhys) and their corresponding data types as defined in the default configs.NCDHW
is for all 3DCNN-based neural methods (e.g., PhysNet, PhysFormer) and their correspoding data types as defined in the default configs.NDHWC
seems like it's meant to correspond to the unsupervised methods (e.g., POS, CHROM) and the corresponding Raw
data type, but it does seem a bit misleading the way it's currently noted - you can assume it's really just how the data is loaded initially through the dataloader.Note that T
is basically a temporal dimension, in this case really it corresponds to frames of a video. N
effectively corresponds to the batch size, D
the frames or temporal dimension, H
the height, W
the width, and C
the channels. The noted representations (e.g., NDCHW) are effectively the batched data representations (e.g., what the neural method trainers see).
Does the DATA_FORMAT
parameter make more sense now?
Hi @yahskapar , Thank you so much for the explanation. This does clear it up a bit.
I have just 2 questions.
So, if we wish to use the 2DCNN
based methods for inference/training, we should always set the parameter to "NDCHW
" and for 3DCNN
methods we should always set to "NCDHW
" in the config files, is this correct?
What format is the __getitem__
function returning? Is it supposed to return data in the format(channels, temporal, Height, WIdth)
? However, for the unsupervised config files (e.g. UBFC-PHYS_UNSUPERVISED.yaml
), it is returning data in the format (time, height, width, channels)
, right? I am a little confused about this mismatch.
I would greatly appreciate it if you could please help clarify these.
Hi @Ishtiaque-khan97,
N
completely since the data isn't batched yet. I would propose maybe changing this in the code itself, but I worry that the reason it was originally done this way was to make related code elsewhere (e.g., in the trainer files) more understandable. Perhaps a descriptive comment can be added at the least. I think your current understanding is correct - it's returning the data in the specified format effectively, where the ordering may vary depending on the specified format as I mentioned before. In the unsupervised case, it's really returning (D,H,W,C)
(corresponds to how images are loaded by default in NumPy with H,W,C
I believe). I think perhaps the comment regarding (3,T,W,H)
is a bit confusing, since realistically it corresponds to (C,D,W,H)
which shouldn't actually be returned based on my understanding. I'll try to verify this comment-related discrepancy a bit later when I have time and correct it to avoid future confusion.
@yahskapar I understand now. Thank you for the detailed response. We can close this issue now if you want.
Great - feel free to make a new issue in case you have any other questions or have any concerns.
Hi, Firstly, Thank you for this amazing toolbox. I am trying to use this to run inference on my custom dataset. Could you please explain the function of the Data_format parameter (e.g. NDCHW)?
I see that it is used to reshape the video frame data in the baseloader.py
Is this function supposed to return data in the format (channels, TIme, Height, WIdth)? However, I saw that all the current data loaders read video files and return numpy arrays of shape (time, height, width, channels) and the format for them in the config files are all "NDHWC". Hence, this if_else clause will be run:
And so, the data returned by the
getitem
should be in the format (time, height, width, channels)., am I right?