Closed haribaskarsony closed 3 months ago
The resolution of global and local crops needs to be divisible by your patchsize. We use patch_sizes of 14 and 16, so you can see an example there where we use 224 as resolution for the global crops and for the local resolution we use either 96 (for patch_size 16) or 98 (for patch_size 14).
Depending on how you add the register tokens, you dont need to change anything. Each ID head has a "pooling" which extracts the corresponding token to use. We use the first token by default ("cls") which corresponds to the CLS token. You can simply adjust the ClassToken pooling implementation and you should be ready to go.
Hi @BenediktAlkin thank you for the reply. I encountered a different issue while loading my model for refining. It seems that i need to modify the model backbone structure to load my model. Could you pls point me to the files where i can do this?
Reg_token is an additional layer just like cls_token. I need to explicitly mention the dimension of reg_token layer to initialize and load my model weights.
This is the file where the ViT is implemented.
You can add your register tokens there and also adjust the load_state_dict to correctly load them.
Hi @BenediktAlkin, I'm curious about the proper model configuration to be set here: https://github.com/ml-jku/MIM-Refiner/blob/main/src/yamls/stage2/l16_d2v2.yaml
more important the parameter "kind" under [model, encoders, initializers] for model initialization:
this is the current model config that i have:
what is the criteria to choose a given "kind" name? What configuration would be right for me? this is the model backbone i'm hoping to get: https://github.com/kyegomez/Vit-RGTS/blob/main/vit_rgts/main.py
the kind property is populated by the initializer; as different models can have different vits (e.g. D2V2 uses a postnorm ViT whereas all others use a prenorm ViT)
you can find the exact code here
so you would need to adjust the logic of the pretrained_initializer to set the kind to your custom model name
i can see there are 3 different "kind" parameters in the config.
So its kind of confusing how the model initialization itself is organized.
the kind corresponds to a file where a class lies; it will be initialized according to the location of the yaml where it is
the kind in the model, will instantiate a ContrastiveModel
, the kind of the encoder will instantiate a Vit
model and the kind of the initializer will instantiate a PretrainedInitializer
. This is a factory pattern which makes the yaml configuration much easier.
What i infer now is that i should focus on the "kind" and other parameter under the subsection = Encoder:. Am i correct in assuming that?
ideally you put your custom model in the same folder as the vits. For example with the filename custom_model
that contains a class CustomModel
and then you only have to change the code in the initializer to fill in kind: vit.custom_model
when loading your custom checkpoint
i forgot to ask in the beginning, should i explicitly use the kappamodule packages to define the model layers?
you can, but its not necessary; kappamodules is an independent collection of modules such as transformer blocks but you can implement your own as well
Hi , for the pos_embed dimension that i have there is no specific implementation in https://github.com/ml-jku/MIM-Refiner/blob/main/src/initializers/pretrained_initializer.py
If i add a case for my own pos_embed dimension [1, n, 1024]: does it have an impact on refining?
No, this is only for loading the model; the refinement is agnostic to model architecture
i got not-implemented error in this case.
Obviously your case is not implemented, but you also dont need it so you can comment it out
@BenediktAlkin I find that there are multiple templates for imagenet datasets under: https://github.com/ml-jku/MIM-Refiner/tree/main/src/zztemplates/datasets/imagenet what is the criteria in choosing any given template for trainset?
Hi, I have custom vit-l model with a different patch_size(n) and some reg token layers. What changes do i need to make in the script to refine my model?