Closed wuvei closed 4 years ago
The extract tool writes out .npy
files with clip features pooled in space and time.
The VideoModel
class you copied chops off the classification head, so that we get the clip features and not just a single class index. You can use the models directly and don't need the VideoModel
class on top of it. Then you will also get the classification layer at the end:
Hi, I am working on using a pretrained model to do video classifications and I'm a beginner. I borrowed codes from
extract.py
in cli and other sources. Following codes did produce some results, but seemed not correct. In addition, for some videos, there were indices larger than 400 inmax_indices
. Appreciate if anyone could help with the codes!classes.json
is from here.