Closed leobxpan closed 5 years ago
This is a common problem in relational detection. You can see Neural-Motifs (https://github.com/rowanz/neural-motifs) as an example, using the statistics in the data may actually give you good enough performance. Relationship detection is still in its infancy and there're lots of things you can do.
Thanks for your reply. Is there any specific reason why you choose to not do explicit modeling for the modules (i.e., objects, scenes, predicates) for Charades? I see you did that for the ImageNet Video.
No, there's no specific reason. I think modeling via individual module may definitely help.
Got it. Thanks for your time!
Thanks for the great work!
It seems that for Charades you're getting the node (scene, object and predicate) feature conditioned on the entire i3d feature (rather than using an object detector). Then how do you make sure this node (two matrices actually) really is the feature for that corresponding entity (scene, object or predicate) rather than falling back to a classifier that leverages other information (e.g., the node that predicts "closing a door" might not contain the actual action info since it can cheat by seeing there's a door).
Thanks!