tianyu0207 / RTFM

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]
324 stars 77 forks source link

How to extract the I3D-10crop feature? #33

Closed DungVo1507 closed 3 years ago

DungVo1507 commented 3 years ago

Hello, Yu Tian. I have 2 questions hope you respond:

tianyu0207 commented 3 years ago

Hello, Yu Tian. I have 2 questions hope you respond:

  • Can you tell me why I3D-10 crops features of XD-Violence are not provided in this git?
  • Please can you show me the I3D-10 crops feature extraction code? Because I extracted only 2 dimensions, not 3 dimensions like yours. Thank you!

For XD-VIolence, You can just use the I3D feature provided in https://roc-ng.github.io/XD-Violence/.

Or, you can simply use this resnet50 I3D to extract the feature. https://github.com/Tushar-N/pytorch-resnet3d.

The 10-crop I use is just the Pytorch official 10-crop function.

DungVo1507 commented 3 years ago

Thank you @tianyu0207, currently, I have experimented on 2 datasets ShanghaiTech and UCF-Crime. Although I have read it many times, I still don't fully understand how your proposed method (RTFM) works step by step.

Thank you very much! RTFM

chengengliu commented 3 years ago

Hello, Yu Tian. I have 2 questions hope you respond:

  • Can you tell me why I3D-10 crops features of XD-Violence are not provided in this git?
  • Please can you show me the I3D-10 crops feature extraction code? Because I extracted only 2 dimensions, not 3 dimensions like yours. Thank you!

For XD-VIolence, You can just use the I3D feature provided in roc-ng.github.io/XD-Violence.

Or, you can simply use this resnet50 I3D to extract the feature. Tushar-N/pytorch-resnet3d.

The 10-crop I use is just the Pytorch official 10-crop function.

Hi Yu Tian, I'm wondering what is the input that you fed into the backbone I3D-Res? Is it every frame you fed into it or every 16 frames(aka one clip)?

DungVo1507 commented 3 years ago

Hi @chengengliu I am also learning about this article and have a few questions if you understand please answer me, thank you! I have 4 questions that I hope you can explain:

  1. After obtaining the final temporal feature representation X, the snippets have been divided into 2 groups normal and abnormal, right?
  2. In the Select Top-k snippets stage, do you select k snippets from both the normal and the abnormal groups, or will each group select k snippets?
  3. Assuming k = 3, in case a video has less than 3 abnormal or normal snippets, how will RTFM choose?
  4. When the input is normal video, how will the RTFM-enabled Snippet Classifier Learning stage classify?
tianyu0207 commented 3 years ago

Hello, Yu Tian. I have 2 questions hope you respond:

  • Can you tell me why I3D-10 crops features of XD-Violence are not provided in this git?
  • Please can you show me the I3D-10 crops feature extraction code? Because I extracted only 2 dimensions, not 3 dimensions like yours. Thank you!

For XD-VIolence, You can just use the I3D feature provided in roc-ng.github.io/XD-Violence. Or, you can simply use this resnet50 I3D to extract the feature. Tushar-N/pytorch-resnet3d. The 10-crop I use is just the Pytorch official 10-crop function.

Hi Yu Tian, I'm wondering what is the input that you fed into the backbone I3D-Res? Is it every frame you fed into it or every 16 frames(aka one clip)?

Hi I fed for every 16 frames.

MTanveerR commented 2 years ago

Hi Yu Tian,

I am running your RTFM code and found the below error

ValueError Traceback (most recent call last) ~/rtfm/main.py in 61 loadera_iter = iter(train_aloader) 62 ---> 63 train(loadern_iter, loadera_iter, model, args.batch_size, optimizer, viz, device) 64 65 if step % 5 == 0 and step > 200: ~/rtfm/train.py in train(nloader, aloader, model, batch_size, optimizer, viz, device) 84 model.train() 85 ---> 86 ninput, nlabel = next(nloader) 87 ainput, alabel = next(aloader) 88 /opt/anaconda3/envs/latest/lib/python3.7/site-packages/torch/utils/data/dataloader.py in next(self) 344 def next(self): 345 index = self._next_index() # may raise StopIteration --> 346 data = self._dataset_fetcher.fetch(index) # may raise StopIteration 347 if self._pin_memory: 348 data = _utils.pin_memory.pin_memory(data)

/opt/anaconda3/envs/latest/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index) 42 def fetch(self, possibly_batched_index): 43 if self.auto_collation: ---> 44 data = [self.dataset[idx] for idx in possibly_batched_index] 45 else: 46 data = self.dataset[possibly_batched_index]

/opt/anaconda3/envs/latest/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in (.0) 42 def fetch(self, possibly_batched_index): 43 if self.auto_collation: ---> 44 data = [self.dataset[idx] for idx in possibly_batched_index] 45 else: 46 data = self.dataset[possibly_batched_index] ~/rtfm/dataset.py in getitem(self, index) 56 57 label = self.get_label() # get video level label 0/1 ---> 58 features = np.load(self.list[index].strip('\n'), allow_pickle=True) 59 60

/opt/anaconda3/envs/latest/lib/python3.7/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding) 451 else: 452 return format.read_array(fid, allow_pickle=allow_pickle, --> 453 pickle_kwargs=pickle_kwargs) 454 else: 455 # Try a pickle

/opt/anaconda3/envs/latest/lib/python3.7/site-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs) 766 array = array.transpose() 767 else: --> 768 array.shape = shape 769 770 return array

ValueError: cannot reshape array of size 262112 into shape (67,10,2048)

Please could you guide how this error can be resolved. Thanks sir,

DungVo1507 commented 2 years ago

@MTanveerR Hello, I used to face some difficulties when extracting the I3D feature, but I overcomed thanks to this repo here

Hope it helps you!