Closed misayllk closed 1 year ago
Hi,
I think you could mix features from frame-wise extractors such as resnet50 and CLIP. I3D features rely on video clips (64 frames processed at once) and, thus, have different temporal dimensions.
Also, note that RAFT extracts a full-resolution frame with optical flow directions.
I will convert it to a discussion as it is not an issue.
thanks for your great code! In my recently works,i have to mix the features from different network ,but the out features' sizes were not match. I want to mix the feature from the resnet50 and RAFT(or I3D ) I don't know how to deal with that,could some one help me?😥