Doubts about the data pre-processing

merlresearch / AVSGS

Audio Visual Scene-Graph Segmentor

GNU Affero General Public License v3.0

3 stars 0 forks source link

Hi @YYX666660 ,

Appreciate your interest in our work. The file Valid_Videos_Vis_Text.pickle lists out the videos which we obtain after our pre-processing. This entailed discarding videos where the sound did not agree with the visuals, such as graphics playing while a baby cries in the background. You should feel free to design/customize such protocols for your dataset. The Vision_Text_Labels.csv file lists the label of the audio class, the label of the principal object in the Visual Genome dataset, the frame index of where the most confident detection of this object was found, and which of the upto 20 objects detected in the confident frame corresponds to this principal object. Hope this helps!

merlresearch / AVSGS

Doubts about the data pre-processing #1