Open GondekNP opened 6 years ago
I think this is a really interesting idea. The model might not be great at identifying a specific type of object, the model of a car
for example, but it knows that it is a type of car
, which is a type of vehicle
, which is a type of manmade object
.
This has been done before with promising results: http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w32/Kumar_Hierarchical_Category_Detector_ICCV_2017_paper.pdf.
Adding it will be non-trivial, so again I'm going to refer to @olgaliak for this. Another thing we could try, which would be a little easier to implement, is to just double/triple up the tags - if a car is also a vehicle and a manmade object, then we could have anything tagged as a car also tagged as a vehicle and as a manmade object. Then, when predicting, if the detector predicts a "car" with low confidence but a "manmade object" or a "vehicle" with much higher confidence, we could ignore the more specific tag for the general one. Where this would become challenging is when using any form of Non-Maximum Suppression, which Tensorflow object detection has by default in many of its object detection models. You would have to modify the Non-Maximum Suppression to ignore overlaps between tags in the same branch. Luckily, there's already NMS code written in map_evaluation.py, so you could simply reuse this code as part of create_predictions.py. @olgaliak let me know if you're interested in trying this out - it would be a pretty cool modification to make!
I think that this would be pretty cool! Maybe a model/project would need a "tag library" file that defines the hierarchy and how tags are nested within each other. This would be supplied upon project initialization and could be updated/added to as new tags were needed for unanticipated labels.
Yeah, it could probably just be a list of child -> parent pairs and the code could create the tree structure.
Abram and I were discussing a few edge cases that we are running into, specifically when we can identify a blurry or cut-off image down to a general group but not down to a species (for eg, can tell it's a mammal but can't tell which, or when we have a bad angle on a shearwater/petrel and cannot determine which), and we were wondering about the possibility of having hierarchical classes. In our biological use-case, this makes a lot of sense - for example, if we label something as a hawaiian petrel, we can assume it's an animal and part of the Procellariiformes order, part of the Pterodroma genus, etc etc. Then, if we couldn't ID a bird down to species, we could simply label it with the most specific label we have for it, whether it's Pterodroma, Procellariiformes, or just Aves.
I'm not sure how much re-invention would be required to make that work. In a perfect world, the model would be able to use all of the levels of the tag hierarchy, meaning that all of the specific tags would also be incorporated in each more general class, instead of each level being considered its own unique class.
Just curious if this is possible!