mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Other
509 stars 89 forks source link

Pretrained ViTWSI-4096 model #37

Closed KiyoshiMu closed 1 year ago

KiyoshiMu commented 1 year ago

Thanks for sharing this excellect work! The method is both amazing and elegant.

I wonder if there is a pretrained ViTWSI-4096(n = 2, h = 3, d = 192) which aggregate the [CLS]4096 tokens and generate a slide-level representaion.

JuanDuranMcgill commented 1 year ago

I would be interested in this too

Richarizardd commented 1 year ago

Oops - closed this issue without comment. At the moment, there is not a ViT trained for [4096 x 4096] tokens, but it is exciting future work!