kjm1559 / data2vec

data2vec with vit
MIT License
0 stars 0 forks source link

Data2Vec

Architecture

data2vec Base network -> ViT

Problem definition

We use the MNIST and CIFAR10 datastes to show the framework works well in practice.
And any model can apply this framework by self-supervised sequence module.

Result

Enviroment

tensorflow = 2.10

Reference

[1] Baevski, Alexei, et al. "Data2vec: A general framework for self-supervised learning in speech, vision and language." arXiv preprint arXiv:2202.03555 (2022).
[2] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).