jasongief / CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
22 stars 4 forks source link

How to extract the ". npy" file corresponding to audio #4

Open rizentan opened 3 hours ago

rizentan commented 3 hours ago

Hello author, thank you for your excellent work!

I want to use other datasets for video parsing training. I found the ".py" file for video feature extraction in directory cpsp_avvp/scripts. How can I generate the ".npy" file for audio features in the data/kaets/vggish directory? Can I use "Scripts for generating audio and visual features" mentioned by AVEL to generate, but only generate the ".h5" file corresponding to the audio

Thanks a lot!

jasongief commented 2 hours ago

Hi, Of course. You can use the scripts provided by the excellent AVEL repo. to extract audio features. The dimension of the obtained audio features should be Tx128 (T is the temporal length) if you using the VGGish model for feature extraction.