happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
419 stars 77 forks source link

Negative Action Regions (in Epic Kitchens) #82

Closed beasteers closed 1 year ago

beasteers commented 1 year ago

How does Actionformer work when it encounters a region of video without a labeled action?

I can think of three possibilities

From what I can tell from the code, actionformer is not explicitly trained on negative samples so my expectation is the first one.

tzzcl commented 1 year ago

In ActionFormer, it doesn't has an extra "background" class. In contrast, it uses a binary classifier with N number of classes (in the one-hot format). If a region of the video does not have a labeled action, the feature points which located in the region will be annotated as all zero. If a point is in the region of action, it will be labeled as [0,...,1,...,0].

beasteers commented 1 year ago

Ah I see! I missed that it's sigmoid not softmax outputs which means that it may predict multiple classes. Thanks so much!