mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Other
498 stars 86 forks source link

Some Questions #27

Closed bryanwong17 closed 1 year ago

bryanwong17 commented 1 year ago

Hi @Richarizardd,

I would be glad if you can answer some of my questions below:

  1. Before running main_dino4k.py, I notice you saved all [256-Length x 384-Dim] Tensors for the input which correspond to extracted ViT-16 features for 4K x 4K patch. May I know in which part of the code you did that? Do I need to extract patches of 4k x 4k in jpg or png format to get the input tensors?

  2. Should I change the equation below if batch size=64 couldn't be used? How about the learning rate?

args.lr (args.batch_size_per_gpu utils.get_world_size()) / 256.

  1. Is HIPT_LGP_FC your main model?

Thanks for your time and kindness!

Richarizardd commented 1 year ago
  1. 4K feature extraction can be performed via modifying CLAM to do 4K patching (instead of 256 patching - simply pass in the patch_size = 4096), followed by using the HIPT_4K API as the feature extractor.
  2. I think you can keep the equation as-is.
  3. HIPT_LGP_FC is the main model.