Merge rgb and flow? - Githubissues

happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)

MIT License

419 stars 77 forks source link

Merge rgb and flow? #55

Closed bleakie closed 1 year ago

bleakie commented 1 year ago

Merge rgb and flow, with normalization or standardization?

tzzcl commented 1 year ago

Hi, I don't fully understand your questions, currently we directly concatenate RGB and Flow features, without normalization or standardization.

bleakie commented 1 year ago

说句家乡话，两个问题： 1.RGB and Flow features可能存在特征量级上的差异，如果不进行normalization or standardization，会不会造成某个特征不work？ 2.在提取Flow features的时候，您使用的step是多少呢？（即做光流的两帧之间的distance）

tzzcl commented 1 year ago

对于你的问题，我个人建议是配合我的回答并且阅读一下I3D原文和Two-Steam Networkf For Action Recognition可能会更好。第一个问题： RGB特征和Flow特征，往往是由同一个网络架构（尽管权重不同）所提取出来的。而RGB和Flow特征输入形式都是图片，即范围都在[0, 255]之间，输出的特征往往为最后一层全连接层之前的特征。由于网络结构相同，输入数据范围相同，一般不需要对特征做额外的normalization/standardization，当然您可以尝试一下，如果效果更好的话也OK。第二个问题：对于optical flow，step为1，即是逐帧提取的，即相邻的两帧之间直接算光流，一个总共有N帧的视频，可以得到N-1帧光流输出。然后再使用滑动窗口，提取特征。

bleakie commented 1 year ago

非常感谢您的耐心回复，我在调试代码过程中遇到一个疑问，在I3D Feature Extraction中，frequency=16,我理解是每16帧合并一起提取为一个，一个T=1600的视频，得到就是100×2048的特征（不知道我这么理解有没有错误），在actionformer_release里家假设([3,4] 30 - 0.5 16) / 4=[20.5,28]，我理解的是原始对应的标签为[5.625,7.5]，我错在哪里呢？

tzzcl commented 1 year ago

对于特征的提取，除了有输入的帧数（16）以外，还有输入stride的问题，即每隔多少帧，抽取16帧作为输入，对于THUMOS14，我们每隔4帧提取一次特征，主要的错误在这里

bleakie commented 1 year ago

我在I3D Feature Extraction中没有看到有关stride的代码呢？

tzzcl commented 1 year ago

我在I3D Feature Extraction中没有看到有关stride的代码呢？

请参考https://github.com/Finspire13/pytorch-i3d-feature-extraction/blob/master/extract_features.py#L164 中的frequency参数

happyharrycn commented 1 year ago

Closed due to inactivity.