shvdiwnkozbw / Multi-Source-Sound-Localization

This repo aims to perform sound localization in complex audiovisual scenes, where there multiple objects making sounds.
79 stars 15 forks source link

How to create pkl file of audio-image pairs and the whole scripts for complete pipeline #5

Open JackHenry1992 opened 3 years ago

JackHenry1992 commented 3 years ago

In your code, I see that you feed one image to the network. But AVE-dataset is composed of videos, do you only extract a single frame of a video? And I can not find the pkl file of audio-image pairs Furthermore, can you provide the detailed scripts for generate_vlabel and gt-data?

shvdiwnkozbw commented 3 years ago

For AVE dataset, we extract frames at 1fps, and use the image and its corresponding 1 second audio clip as a pair for training and evaluation. And the script for generating image pseudo labels is shown in generate_labelv.py You can add the path of your pytorch pretrained resnet to the script for inference

nakaotatsuya commented 3 years ago

Hello, Thank you for creating generate_labelv.py .

for one video, you can have 10 audio/image pairs because each AVE-dataset video has 10 seconds, right? And, I still don't know how to create the .pkl file of audio-image pairs. I want to know the script for creating pkl file from raw datasets.

poult-lab commented 3 years ago

Hello, Thank you for creating generate_labelv.py .

for one video, you can have 10 audio/image pairs because each AVE-dataset video has 10 seconds, right? And, I still don't know how to create the .pkl file of audio-image pairs. I want to know the script for creating pkl file from raw datasets.

I also want to know how to get audio/image pairs ...