Open bmnptnt opened 1 month ago
Thank you for your interest in our work.
Our MSBT module, as implemented in the MultiScaleBottleneckTransformer.py
file, intrinsically supports the fusion of two modalities without requiring any modifications.
First, you need to modify the load_dataset.py
file so that it only returns RGB and Flow features.
Next, you need to update the MultimodalTransformer.py
file to ensure that only RGB and Flow are used during feature fusion. For example:
f_v, f_f = self.fc_v(f_v), self.fc_f(f_f) f_v, f_f = self.msa(f_v), self.msa(f_f) f_vf, b_vf = self.MST(f_v, f_f) f_fv, b_fv = self.MST(f_f, f_v) bottle_cat = torch.cat([b_vf, b_fv], dim=1)
Thank you for your quick and detailed response.
Hi, I'm studying Video Anomaly Detection now. First, Thank you for your contribution.
I'm wondering that can I implement your model with out audio feature? What I mean is only RGB and Flow feature. If it is possible, How can modify you code?
Thank you.