Open rizentan opened 3 hours ago
Hi, Of course. You can use the scripts provided by the excellent AVEL repo. to extract audio features. The dimension of the obtained audio features should be Tx128 (T is the temporal length) if you using the VGGish model for feature extraction.
Hello author, thank you for your excellent work!
I want to use other datasets for video parsing training. I found the "
.py
" file for video feature extraction in directorycpsp_avvp/scripts
. How can I generate the ".npy
" file for audio features in thedata/kaets/vggish
directory? Can I use "Scripts for generating audio and visual features" mentioned by AVEL to generate, but only generate the ".h5
" file corresponding to the audioThanks a lot!