mx-mark / VideoTransformer-pytorch

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
272 stars 34 forks source link

How do we load ImageNet-21k ViT weights? #19

Closed Darktex closed 2 years ago

Darktex commented 2 years ago

Hi guys, thanks for open sourcing this repo!

I see that your pretrained K600 models were initialized from the ViT ImageNet-21k weights. Can you share a snippet on how you initialized them? Did you use the models from timm?

Thanks!

mx-mark commented 2 years ago

@Darktex The code can be found in https://github.com/mx-mark/VideoTransformer-pytorch/blob/194cae69722eb5efad031c59f4ff03bc60633fa8/weight_init.py#L107

mx-mark commented 2 years ago

@Darktex The pre-train model is converted by others from the original repo https://github.com/google-research/vision_transformer.

Darktex commented 2 years ago

Thank you!