lightly-ai / lightly

A python library for self-supervised learning on images.
https://docs.lightly.ai/self-supervised-learning/
MIT License
2.83k stars 246 forks source link

Change the backbone of the SimSiam to ViT #1570

Open s9021025292140 opened 3 days ago

s9021025292140 commented 3 days ago

Could you please advise on how to change the backbone of the SimSiam example to ViT? (https://docs.lightly.ai/self-supervised-learning/tutorials/package/tutorial_simsiam_esa.html)

Additionally, for the DINO (https://docs.lightly.ai/self-supervised-learning/examples/dino.html) and AIM (https://docs.lightly.ai/self-supervised-learning/examples/aim.html) examples, which use the PascalVOC dataset, how can I modify them to use my own classification dataset (with multiple folders, each containing images for a specific class)?

Thanks!

guarin commented 3 days ago

Hi!

Could you please advise on how to change the backbone of the SimSiam example to ViT? (https://docs.lightly.ai/self-supervised-learning/tutorials/package/tutorial_simsiam_esa.html)

If you use the newest timm version you have to do the following changes:

import timm
backbone = timm.create_model("vit_tiny_patch16_224")

num_ftrs = backbone.num_features

class SimSiam(Module):
    def forward(self, x):
        # get representations 
        f = self.backbone.forward_features(x)
        f = self.backbone.pool(f).flatten(start_dim=1)
        # code below should be the same
        ...

# and for the embedding part:
with torch.no_grad():
    for i, (x, _, fnames) in enumerate(dataloader_test):
        # move the images to the gpu
        x = x.to(device)
        # embed the images with the pre-trained backbone
        y = model.backbone.forward_features(x)
        y = model.backbone.pool(y).flatten(start_dim=1)
        # store the embeddings and filenames in lists
        embeddings.append(y)
        filenames = filenames + list(fnames)

I would also recommend to switch the optimizer from SGD to AdamW. You might have to adjust the hyperparameters also a bit to make the model train well. ViTs are usually a bit tricky to get working.

Additionally, for the DINO (https://docs.lightly.ai/self-supervised-learning/examples/dino.html) and AIM (https://docs.lightly.ai/self-supervised-learning/examples/aim.html) examples, which use the PascalVOC dataset, how can I modify them to use my own classification dataset (with multiple folders, each containing images for a specific class)?

You can use:

from lightly.data import LightlyDataset

dataset = LightlyDataset("path/to/folder")