VisionTransformerCP vs VisionTransformerLadder

sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Apache License 2.0

106 stars 5 forks source link

Closed tqosu closed 4 weeks ago

tqosu commented 1 month ago

Hi Shuming,

What are VisionTransformerCP and VisionTransformerLadder?

Thanks.

sming256 commented 4 weeks ago

VisionTransformerCP is the rewrite of the official VisionTransformer used in VideoMAE, but supports activation checkpointing (with_cp=True).

VisionTransformerLadder is the side-tuning architecture of VideoMAE, which is introduced in AdaTAD Figure 5.

tqosu commented 4 weeks ago

Thanks.