rhgao / co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)
Creative Commons Attribution 4.0 International
92 stars 23 forks source link

Class Label indices #10

Closed metro-smiles closed 4 years ago

metro-smiles commented 4 years ago

Hi @rhgao ,

Sorry for bugging you with another query.

So I was trying to understand how the ground-truth labels are being assigned for the object-consistency classifier and had some queries about that. So it seems the pretrained Faster R-CNN detector predicts in the space of 16-objects, with the background as the first class (getDetectionResults.py, line 170). Now, when loading these detections in the loader, we first shift the output space of ground-truth labels by 1, such that the labels are now between [-1, ..., 14] rather than [0, ..., 15] (audioVisual_dataset.py, line 144). However, when constructing the label of the additional image (background), we assign it to be (self.opt.number_of_classes - 1), i.e. the last class, which is index = 15 - 1 = 14 (audioVisual_dataset.py, line 162). Should this not conflict with the 15th object class? P.S.: I see that the opt.number_of_classes is increased by 1 for additional image case in the train.py file (line 253) but this is after the loaders have been defined (line 219).

Is this a typo or am I missing something? Would really appreciate your inputs on this.

rhgao commented 4 years ago

Hi @metro-smiles, right, it might be a typo in code cleaning. opt.number_of_classes should be increased by 1 before data loader is defined. Thanks for pointing it out.