revanurambareesh / instantaneous_transformer

Official repo of Instantaneous Transformers for Video based Physiology estimation (Accepted to AAAI workshop and Springer Studies in Computational Intelligence)
20 stars 2 forks source link

Train the model #2

Open italosalgado14 opened 2 years ago

italosalgado14 commented 2 years ago

Hello!. Thanks for the code. I try to train the model, but i am facing some issues related with: 1) How much power i need to train the model? (GPU with 8Gb, 16Gb, model, etc.) 2) The arrangement of folders to train the model. What is "private/txt_bp/"? (need in 3. BVP (Blood Volume Pulse) signal with butterworth in dataloader.py). In the V4V dataset there is no BP (Blood Presure) information for the validation data (only for training). Thanks in advance!

revanurambareesh commented 2 years ago
  1. We used 12 GB GPU card. However, you should be able to train the model with limited resources by tuning the hyperparameters.
  2. Since BP was not part of the validation set in the V4V challenge, the validation set perhaps is missing the BP information. However there should be a ground truth HR / RR file that can be used for validation directly.

For the test set, the ground truth is hosted on Codalab at this link.

italosalgado14 commented 2 years ago

Thanks for the response!. I'm currently trying to train the model, but it's taking toooo long (In 3 days it's not over :S ). I tried:

PS: My system have 12Gb GPU Nvidia 3060, i9 11va, HDD 2T, 32GB RAM.

PS: example log out:

2022-05-06 19:22:37.526 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:22:37.743 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F019_T3.mkv
2022-05-06 19:22:46.261 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:22:46.417 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F059_T7.mkv
2022-05-06 19:22:49.301 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:22:49.461 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/M039_T3.mkv
2022-05-06 19:23:16.464 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:16.851 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F077_T8.mkv
2022-05-06 19:23:19.609 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:19.849 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F019_T9.mkv
2022-05-06 19:23:20.662 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:20.866 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/M048_T3.mkv
2022-05-06 19:23:23.429 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:23.477 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:23.791 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F019_T3.mkv
2022-05-06 19:23:23.964 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F048_T9.mkv
2022-05-06 19:23:24.396 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:24.821 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F048_T9.mkv
2022-05-06 19:23:28.394 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:28.623 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F059_T7.mkv
2022-05-06 19:23:35.138 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:35.377 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/M039_T3.mkv
2022-05-06 19:23:59.483 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:23:59.717 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F077_T8.mkv
...
...
2022-05-06 19:39:53.534 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:39:53.800 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/F019_T3.mkv
2022-05-06 19:39:56.746 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:39:56.882 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/M026_T3.mkv
2022-05-06 19:40:00.011 DEBUG utils - get_all_frames [52]: Video closed. Returning frames.
2022-05-06 19:40:00.638 DEBUG utils - get_all_frames [13]: Reading the video from ../../data/v4v_dataset/Training/Videos/Training/vids/M029_T1.mkv
2022-05-06 19:40:00.785 DEBUG network - forward [53]: torch.Size([3200, 3, 36, 36])
2022-05-06 19:40:00.787 DEBUG network - forward [55]: torch.Size([3200, 32, 36, 36])
2022-05-06 19:40:00.788 DEBUG network - forward [57]: torch.Size([3200, 32, 34, 34])
2022-05-06 19:40:00.789 DEBUG network - forward [69]: torch.Size([3200, 32, 17, 17])
2022-05-06 19:40:00.790 DEBUG network - forward [74]: torch.Size([3200, 64, 17, 17])
2022-05-06 19:40:00.790 DEBUG network - forward [76]: torch.Size([3200, 64, 15, 15])
2022-05-06 19:40:00.791 DEBUG network - forward [90]: torch.Size([3200, 64, 7, 7])
2022-05-06 19:40:00.792 DEBUG network - forward [99]: torch.Size([3200, 128])
2022-05-06 19:40:00.792 DEBUG network - forward [104]: torch.Size([3200, 1])
tomgeek27 commented 2 years ago

Hi, I have the same problem. For me it's not clear (in general) how to train the model for estimate the respiration rate and subsequently test it. I have some questions:

revanurambareesh commented 2 years ago

If you are using the V4V dataset, the reason why training takes long is that the dataloader needs to process large videos present the original V4V dataset. (#1).

Either you could use/create a smaller dataset or cache the processed files on the disk in following manner:

At the beginning of dataloader,

 if os.path.exists(cachepath):
          cached_dict = np.load(cachepath, allow_pickle=True).item()

At the end of dataloader,

if self.use_cache:
            np.save(cachepath, return_dict)

If this helps you, I am happy to accept this change as a pull request.

Following are the args used in this repo:

--batch_size: Batch size used for training/inferencing the model
--epochs: Total number of training epochs
--lr: Learning rate
--name: Experiment name
--datadir: The directory containing path to the dataset
--test: Inference only
--seqlen: Length of the video sequence fed into the model
--gpu: GPU ID (e.g. 0 )
--vidfps: FPS of the videos in dataset (e.g. 25 for V4V dataset)
--numlayer: Number of encoder layers in transformer
--phys: Physiological signal type (e.g HR or RR)

The gt_validation.txt should contain the per-frame ground truth in the V4V submission format. You can find the V4V submission format here: Link

X axis should be the iteration-number for training curve and should be the epoch-number for the validation graphs. Y axis is the value for the loss value for training and validation set. As discussed in the paper the loss formulation in used in this repo is a negative max-cross correlation loss. (Paper)

tomgeek27 commented 2 years ago

Hi, thank you for the answer! I have another question about the respiration chart. What if i want to plot the respiration rate prediction chart? I mean, I would like to give as input one video and see the estimation computed by the trained model on RR on that video. Is this possible?