Closed beasteers closed 1 year ago
In ActionFormer, it doesn't has an extra "background" class. In contrast, it uses a binary classifier with N number of classes (in the one-hot format). If a region of the video does not have a labeled action, the feature points which located in the region will be annotated as all zero. If a point is in the region of action, it will be labeled as [0,...,1,...,0].
Ah I see! I missed that it's sigmoid not softmax outputs which means that it may predict multiple classes. Thanks so much!
How does Actionformer work when it encounters a region of video without a labeled action?
I can think of three possibilities
From what I can tell from the code, actionformer is not explicitly trained on negative samples so my expectation is the first one.