Open HankKung opened 4 months ago
The I3D features are extracted from a sliding window, so the extracted length should be the difference between video length and window size (commonly set as 24). The last 1/2 window and the first 1/2 window can be dropped.
That's my opinion. You can refer to this git repo for the pipeline. https://github.com/v-iashin/video_features?tab=readme-ov-file
Hi, I'd like to do the process on raw videos but found that the length of raw videos and labels (downloaded from the official Breakfast dataset) is different from the extracted I3D features and the corresponding labels.
Did you do any preprocessing on the raw videos before extracting features?