Closed 1980x closed 5 months ago
Visual and Audio Features shared by you have the same first dimension. Is it necessary?
Yes, I kept the vggish features having the same lengths with corresponding visual features. You need to change the stride when extracting vggish features by changing "EXAMPLE_HOP_SECONDS = 0.96 " to 0.32 in https://github.com/v-iashin/video_features/blob/master/models/vggish/vggish_src/vggish_params.py. Because for visual features, fps=25, window_size=24 and stride=8 which equals to window_size=0.96s and stride=0.32.
Thanks. This worked.
Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d feature dimension for videos matches with that of already shared by you. Eg. it gives 112x1024 rgb and flow features which matches with that of yours.
But for audio features, after converting the video fps to 25 and without converting fps, python main.py feature_type=vggish produces features which don't match with that of shared by you. Eg. It gives 32x128 dim feature only. Can you please tell what needs to be done so that I can get same 112x128 audio feature?
Thank you.