Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
I want to use CAM to analyze the network's attention to each frame in gait recognition tasks.
The inputs video size is (B=1,T,H,W). After extracting features from each frame through the 2DCNN such as Res-net, the intermediate features are (T, d), which are then processed in temporal aggregation to obtain the final features, then classified. My target_layer is set on the last layer of the resnet network, and I am not sure if the generated heat map is valid.
I want to use CAM to analyze the network's attention to each frame in gait recognition tasks. The inputs video size is (B=1,T,H,W). After extracting features from each frame through the 2DCNN such as Res-net, the intermediate features are (T, d), which are then processed in temporal aggregation to obtain the final features, then classified. My target_layer is set on the last layer of the resnet network, and I am not sure if the generated heat map is valid.