DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition

DACAT

DACAT consists of two main branches, $\textit{i.e.}$, (i) Frame-wise Branch (FWB) processing the frame-wise feature and (ii) Adaptive Clip-aware Branch (ACB) which reads out the most relevant clip with the current frame from pre-trained feature cache and integrates these frame-wise features into adaptive clip-aware feature through cross-attention (CA) module. DACAT enhances the relevant context and filter out interference for current frame, which reduces the the complexity of temporal processing and leads to more accurate phase identification.

Result

1. Preparation

Step 1:

Download the Cholec80, M2CAI16, AutoLaparo

- Access can be requested [Cholec80](http://camma.u-strasbg.fr/datasets), [M2CAI16](http://camma.u-strasbg.fr/datasets), [AutoLaparo](https://autolaparo.github.io/). - Download the videos for each datasets and extract frames at 1fps. E.g. for `video01.mp4` with ffmpeg, run: ```bash mkdir //data/frames_1fps/01/ ffmpeg -hide_banner -i //video01.mp4 -r 1 -start_number 0 //data/frames_1fps/01/%08d.jpg ``` - We also prepare a shell file to extract at [here](src/video2img.sh) - The final dataset structure should look like this: ``` Cholec80/ data/ frames_1fps/ 01/ 00000001.jpg 00000002.jpg 00000003.jpg 00000004.jpg ... 02/ ... ... 80/ ... phase_annotations/ video01-phase.txt video02-phase.txt ... video80-phase.txt tool_annotations/ video01-tool.txt video02-tool.txt ... video80-tool.txt output/ train_scripts/ predict.sh train.sh ```

Step 2:

Download pretrained models ConvNeXt V2-T

- download ConvNeXt V2-T [weights](https://dl.fbaipublicfiles.com/convnext/convnextv2/im1k/convnextv2_tiny_1k_224_ema.pt) and place here: `.../train_scripts/convnext/convnextv2_tiny_1k_224_ema.pt`

Step 3:

Environment Requirements

See [requirements.txt](requirements.txt).

2. Train

2.1 Train Feature Cache

source .../Cholec80/train.sh

After training, please rename and save the checkpoint .../output/checkpoints/phase/YourTrainNameXXX/models/checkpoint_best_acc.pth.tar in .../train_scripts/newly_opt_ykx/LongShortNet/long_net_convnextv2.pth.tar

2.2 Train DACAT

Change the .../Cholec80/train.sh, make python3 train_longshort.py active and

source .../Cholec80/train.sh

3. Infer

Set the model path in .../Cholec80/predict.sh and

source .../Cholec80/predict.sh

Our trained checkpoints can be download in google drive.

4. Evaluate

Reference

BNPitfalls (MIA 24)

Citations

If you find this repository useful, please consider citing our paper:

@article{yang2024dacat,
  title={DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition},
  author={Yang, Kaixiang and Li, Qiang and Wang, Zhiwei},
  journal={arXiv preprint arXiv:2409.06217},
  year={2024}
}

kk42yy / DACAT

readme