wangxiang1230 / OadTR

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".
MIT License
87 stars 12 forks source link

Regarding the dimensionality of the input rgb and flow features #15

Closed Aierhaimian closed 2 years ago

Aierhaimian commented 2 years ago

dear author, I want to use the Repo you gave to generate my own dataset, and then I found this repo. When I generated RGB and flow features separately according to the method of this repo, I found that the RGB feature of each frame is a sequence with a length of 2048, and the flow feature is a sequence with a length of 1024. However, I found that the RGB and flow characteristics of each frame of the THUMOS14 dataset you released are 2048. So, I want to hear how you deal with this problem.

Some supplements: The output location of the rgb feature in "resnet200_anet_2016" model is "caffe.Flatten_673". The output location of the flow feature in "bn_inception_anet_2016_temporal" model is "global_pool".

wangxiang1230 commented 2 years ago

dear author, I want to use the Repo you gave to generate my own dataset, and then I found this repo. When I generated RGB and flow features separately according to the method of this repo, I found that the RGB feature of each frame is a sequence with a length of 2048, and the flow feature is a sequence with a length of 1024. However, I found that the RGB and flow characteristics of each frame of the THUMOS14 dataset you released are 2048. So, I want to hear how you deal with this problem.

Some supplements: The output location of the rgb feature in "resnet200_anet_2016" model is "caffe.Flatten_673". The output location of the flow feature in "bn_inception_anet_2016_temporal" model is "global_pool".

Hi, we've opened source two different features: TSN-ANet (3072 dimensions, 2048+1024) and TSN-Kinetics(4096 dimensions, 2048+2048). At the same time, we are consistent with the feature extraction of previous methods, such as IDN and TRN.

Aierhaimian commented 2 years ago

dear author, I want to use the Repo you gave to generate my own dataset, and then I found this repo. When I generated RGB and flow features separately according to the method of this repo, I found that the RGB feature of each frame is a sequence with a length of 2048, and the flow feature is a sequence with a length of 1024. However, I found that the RGB and flow characteristics of each frame of the THUMOS14 dataset you released are 2048. So, I want to hear how you deal with this problem. Some supplements: The output location of the rgb feature in "resnet200_anet_2016" model is "caffe.Flatten_673". The output location of the flow feature in "bn_inception_anet_2016_temporal" model is "global_pool".

Hi, we've opened source two different features: TSN-ANet (3072 dimensions, 2048+1024) and TSN-Kinetics(4096 dimensions, 2048+2048). At the same time, we are consistent with the feature extraction of previous methods, such as IDN and TRN.

Thank you very much for answering my confusion, now I have understood the problem.