To apply CAT to 3D images, I think it is not only to modify the mask part (although the model can normally run after modifying the mask). It is also necessary to consider the three-dimensional form of Rwin-SA (about window partition). To implement these parts, you can refer to Video Swin Transformer, which applies (2D) Swin Transformer in 3D Video.
If you have any other problem, please let us know. Thanks.
Hi. Thanks for your interest in our work.
To apply CAT to 3D images, I think it is not only to modify the mask part (although the model can normally run after modifying the mask). It is also necessary to consider the three-dimensional form of Rwin-SA (about window partition). To implement these parts, you can refer to Video Swin Transformer, which applies (2D) Swin Transformer in 3D Video.
If you have any other problem, please let us know. Thanks.