DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition
DACAT
DACAT consists of two main branches, $\textit{i.e.}$, (i) Frame-wise Branch (FWB) processing the frame-wise feature and (ii) Adaptive Clip-aware Branch (ACB) which reads out the most relevant clip with the current frame from pre-trained feature cache and integrates these frame-wise features into adaptive clip-aware feature through cross-attention (CA) module. DACAT enhances the relevant context and filter out interference for current frame, which reduces the the complexity of temporal processing and leads to more accurate phase identification.
Result
1. Preparation
Step 1:
Download the Cholec80, M2CAI16, AutoLaparo
- Access can be requested [Cholec80](http://camma.u-strasbg.fr/datasets), [M2CAI16](http://camma.u-strasbg.fr/datasets), [AutoLaparo](https://autolaparo.github.io/).
- Download the videos for each datasets and extract frames at 1fps. E.g. for `video01.mp4` with ffmpeg, run:
```bash
mkdir //data/frames_1fps/01/
ffmpeg -hide_banner -i //video01.mp4 -r 1 -start_number 0 //data/frames_1fps/01/%08d.jpg
```
- We also prepare a shell file to extract at [here](src/video2img.sh)
- The final dataset structure should look like this:
```
Cholec80/
data/
frames_1fps/
01/
00000001.jpg
00000002.jpg
00000003.jpg
00000004.jpg
...
02/
...
...
80/
...
phase_annotations/
video01-phase.txt
video02-phase.txt
...
video80-phase.txt
tool_annotations/
video01-tool.txt
video02-tool.txt
...
video80-tool.txt
output/
train_scripts/
predict.sh
train.sh
```
Step 2:
Download pretrained models ConvNeXt V2-T
- download ConvNeXt V2-T [weights](https://dl.fbaipublicfiles.com/convnext/convnextv2/im1k/convnextv2_tiny_1k_224_ema.pt) and place here: `.../train_scripts/convnext/convnextv2_tiny_1k_224_ema.pt`
Step 3:
Environment Requirements
See [requirements.txt](requirements.txt).
2. Train
2.1 Train Feature Cache
source .../Cholec80/train.sh
After training, please rename and save the checkpoint .../output/checkpoints/phase/YourTrainNameXXX/models/checkpoint_best_acc.pth.tar
in .../train_scripts/newly_opt_ykx/LongShortNet/long_net_convnextv2.pth.tar
2.2 Train DACAT
Change the .../Cholec80/train.sh
, make python3 train_longshort.py
active and
source .../Cholec80/train.sh
3. Infer
Set the model path in .../Cholec80/predict.sh
and
source .../Cholec80/predict.sh
Our trained checkpoints can be download in google drive.
4. Evaluate
4.1 Cholec80
Use the Matlab file.
4.2 M2CAI16
Use the Matlab file.
4.3 AutoLaparo
Use the Python file.
Reference
Citations
If you find this repository useful, please consider citing our paper:
@article{yang2024dacat,
title={DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition},
author={Yang, Kaixiang and Li, Qiang and Wang, Zhiwei},
journal={arXiv preprint arXiv:2409.06217},
year={2024}
}