shengyangsun / MSBT

Official implementation of "Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection"
8 stars 2 forks source link

How can I implement this model only RGB and Flow feature #1

Open bmnptnt opened 1 month ago

bmnptnt commented 1 month ago

Hi, I'm studying Video Anomaly Detection now. First, Thank you for your contribution.

I'm wondering that can I implement your model with out audio feature? What I mean is only RGB and Flow feature. If it is possible, How can modify you code?

Thank you.

shengyangsun commented 1 month ago

Thank you for your interest in our work.

Our MSBT module, as implemented in the MultiScaleBottleneckTransformer.py file, intrinsically supports the fusion of two modalities without requiring any modifications.

First, you need to modify the load_dataset.py file so that it only returns RGB and Flow features.

Next, you need to update the MultimodalTransformer.py file to ensure that only RGB and Flow are used during feature fusion. For example:

f_v, f_f = self.fc_v(f_v), self.fc_f(f_f) f_v, f_f = self.msa(f_v), self.msa(f_f) f_vf, b_vf = self.MST(f_v, f_f) f_fv, b_fv = self.MST(f_f, f_v) bottle_cat = torch.cat([b_vf, b_fv], dim=1)

bmnptnt commented 1 month ago

Thank you for your quick and detailed response.