Some Questions - Githubissues

mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

Other

498 stars 86 forks source link

Hi @Richarizardd,

I would be glad if you can answer some of my questions below:

Before running main_dino4k.py, I notice you saved all [256-Length x 384-Dim] Tensors for the input which correspond to extracted ViT-16 features for 4K x 4K patch. May I know in which part of the code you did that? Do I need to extract patches of 4k x 4k in jpg or png format to get the input tensors?
Should I change the equation below if batch size=64 couldn't be used? How about the learning rate?

args.lr (args.batch_size_per_gpu utils.get_world_size()) / 256.

Thanks for your time and kindness!

mahmoodlab / HIPT