merlresearch / AVSGS

Audio Visual Scene-Graph Segmentor
GNU Affero General Public License v3.0
3 stars 0 forks source link

Doubts about the data pre-processing #1

Open YYX666660 opened 1 year ago

YYX666660 commented 1 year ago

Dear authors: After reading the paper, I really appreciate your great work and the open source code. But I have a question about the data pre-processing. How were the files Valid_Videos_Vis_Text.pickle and Vision_Text_Labels.csv generated? If I want to apply AVSGS on another dataset (Fair-Play dataset), what should I do about the data pre-processing?

metro-merl commented 1 year ago

Hi @YYX666660 ,

Appreciate your interest in our work. The file Valid_Videos_Vis_Text.pickle lists out the videos which we obtain after our pre-processing. This entailed discarding videos where the sound did not agree with the visuals, such as graphics playing while a baby cries in the background. You should feel free to design/customize such protocols for your dataset. The Vision_Text_Labels.csv file lists the label of the audio class, the label of the principal object in the Visual Genome dataset, the frame index of where the most confident detection of this object was found, and which of the upto 20 objects detected in the confident frame corresponds to this principal object. Hope this helps!