Regarding Epic-Kitchen Combination of Noun and Verb

HYUNJS commented 1 year ago

Thank you for sharing your wonderful work! I would like to ask for the details of how did you combine noun and verb action proposals to achieve the reported performance of the action class in Epic-Kitchen. I checked that there were two issues already asked the same question, but I would appreciate it if you could share the exact implementation or the details to reproduce.

https://github.com/happyharrycn/actionformer_release/issues/7 https://github.com/happyharrycn/actionformer_release/issues/29#issuecomment-1134193785

For this part, you can have various choices to fuse these results. For example, you can get the noun/verb predictions for the same point, then take the noun or verb segment predictions as the final segment for this point, or you can simply take the average of these two segments. You may need to modify the code a little bit. We may update this part shortly.

happyharrycn commented 1 year ago

In our previous entry to the EPIC-Kitchen competition, we trained two separate localization models (one for noun and one for verb), and combined their results at inference time. This is done by the following steps.

A video is fed into the two models in parallel.
Each model decodes (a) confidence scores for the target concepts (noun or verb); and (b) the starting and ending time of a candidate event, on a pyramid with multiple levels.
The two pyramids (one from each model) share the same structure (i.e., same number of levels and same number of spots in each level).
Thus, each slot on this pyramid structure has (1) confidence scores for nouns; (2) confidence scores for verbs; (3) boundaries for nouns; and (4) boundaries for verbs.
(1) and (2) are multiplied (and square rooted) to generate scores for individual actions (as combinations of a verb and a noun).
(3) and (4) are averaged to produce the temporal boundary.
Results from all slots are gathered and further processed by soft NMS.

HYUNJS commented 1 year ago

Thank you for your reply :)

happyharrycn / actionformer_release

Regarding Epic-Kitchen Combination of Noun and Verb #97