Closed onlyonewater closed 2 years ago
Sorry about the late response. The feature dimension can be any number that fits to your pre-extracted features. But if you wish to leverage the most out of our pre-trained weights, Slowfast+ResNet101 is preferred. We have explored other features during finetuning in the VALUE paper, results in Table 9 and Section B.1. show that ore-trained weights are transferrable across different vision features.
ok, I get it, thanks, I will have a try!
Does the dimension of the input video features have to be 4352? I want to use pre-trained I3D to extract my own dataset, which its feature dimension is 1024.