Closed tvaranka closed 1 year ago
There is a misunderstanding in your example. The action will only be assigned to one of the feature pyramids, instead of all of them. This assignment is controlled by the regression ranges. Each pyramid level has its own regression range, and only actions with their durations in the range will be assigned to the level (see here). On THUMOS'14, we used non-overlapping regression range, thus each action will be assigned to one of the pyramid levels (more precisely, at most two nearby levels for some corner cases).
So for the segments [ 45.0000, 101.2500], [ 116.2500, 155.2500], [ 294.0000, 322.5000]
with lengths [56.2500, 39.0000, 28.5000]
, their pyramids should be [4, 4, 3] (regression_ranges = [(0, 4), (4, 8), (8, 16), (16, 32), (32, 64), (64, 10000)]
).
For the first segment it should be pyramid 4 because the fourth pyramid has ranges between 32-64 and the length is 56.25. Is this correct?
Also what do you mean by the corner cases? Could you provide an example?
Thanks!
Yes, that is correct.
A corner case can happen when the duration of action lies on the verge of two range brackets. For example, with regression_ranges = [(0, 4), (4, 8), (8, 16), (16, 32), (32, 64), (64, 10000)], our current implementation will assign an action with a duration of 4 (feature grids) to both the first and the second pyramid level.
Thanks for the explanations!
Hey, thanks for the great work.
I have some trouble wrapping my head around the
gt_cls_labels
variable.Example
To my understanding it should be like this. Let me provide an example of my understanding. Example video with 10 frames that has the action class 4 during frames 2-5 should have the following
gt_cls_label
:Sample
Now for an actual sample from a real dataset: However, when I look at the
gt_cls_labels
for a video in thumos (video_validation_0000203) with segments:gt_segments = [[ 45.0000, 101.2500], [ 116.2500, 155.2500], [ 294.0000, 322.5000], ...]
The
gt_cls_labels
is mostly empty. The first pyramid level is all 0s with some 7s here and there, but not nearly as many as I would expect. In fact, there are only 47 non-zero values in the wholegt_cls_labels
.I would like to know where I have gone wrong and if you could explain why the
gt_cls_labels
is so sparse. Thanks!Code
Here is a minimum example to print the non-zero locations of
gt_cls_labels
for the sample video.