I would be glad if you can answer some of my questions below:
Before running main_dino4k.py, I notice you saved all [256-Length x 384-Dim] Tensors for the input which correspond to extracted ViT-16 features for 4K x 4K patch. May I know in which part of the code you did that? Do I need to extract patches of 4k x 4k in jpg or png format to get the input tensors?
Should I change the equation below if batch size=64 couldn't be used? How about the learning rate?
4K feature extraction can be performed via modifying CLAM to do 4K patching (instead of 256 patching - simply pass in the patch_size = 4096), followed by using the HIPT_4K API as the feature extractor.
Hi @Richarizardd,
I would be glad if you can answer some of my questions below:
Before running main_dino4k.py, I notice you saved all [256-Length x 384-Dim] Tensors for the input which correspond to extracted ViT-16 features for 4K x 4K patch. May I know in which part of the code you did that? Do I need to extract patches of 4k x 4k in jpg or png format to get the input tensors?
Should I change the equation below if batch size=64 couldn't be used? How about the learning rate?
args.lr (args.batch_size_per_gpu utils.get_world_size()) / 256.
Thanks for your time and kindness!