lightly-ai / lightly

A python library for self-supervised learning on images.
https://docs.lightly.ai/self-supervised-learning/
MIT License
2.84k stars 246 forks source link

Training the Swav model with a customized dataset reveals that the size of the .ckpt file is the same for each round of saves. #1521

Closed ad5014858 closed 3 months ago

ad5014858 commented 3 months ago

Dear, I am a beginer,When training with a custom dataset and using the ModelCheckpoint library to save the ckpt generated after each training, it was found that the .ckpt file size was the same for each epoch.

The code is as follows:

import pytorch_lightning as pl
import torch
import torchvision
from PIL import Image
from torch import nn
from torch.utils.data import Dataset
import torchvision.transforms as transforms
from pytorch_lightning.callbacks import ModelCheckpoint
from lightly.loss import SwaVLoss
from lightly.models.modules import SwaVProjectionHead, SwaVPrototypes
from lightly.models.modules.memory_bank import MemoryBankModule
from lightly.transforms.swav_transform import SwaVTransform
from lightly.data import LightlyDataset

class SwaV(pl.LightningModule):
    def __init__(self):
        super().__init__()
        resnet = torchvision.models.resnet18()
        self.backbone = nn.Sequential(*list(resnet.children())[:-1])
        self.projection_head = SwaVProjectionHead(512, 512, 128)
        self.prototypes = SwaVPrototypes(128, n_prototypes=512)
        self.criterion = SwaVLoss()

    def forward(self, x):
        x = self.backbone(x).flatten(start_dim=1)
        x = self.projection_head(x)
        x = nn.functional.normalize(x, dim=1, p=2)
        p = self.prototypes(x)
        return p

    def training_step(self, batch, batch_idx):
        self.prototypes.normalize()
        views = batch[0]
        multi_crop_features = [self.forward(view.to(self.device)) for view in views]
        high_resolution = multi_crop_features[:2]
        low_resolution = multi_crop_features[2:]
        loss = self.criterion(high_resolution, low_resolution)
        self.log('train_loss',loss)
        return loss

    def configure_optimizers(self):
        optim = torch.optim.Adam(self.parameters(), lr=0.001)
        return optim

model = SwaV()
path_to_data = '/root/LightlyAI/lightly/data'
transform = SwaVTransform()
dataset_train_swav = LightlyDataset(input_dir=path_to_data, transform=transform)

dataloader = torch.utils.data.DataLoader(
    dataset_train_swav,
    batch_size=64,
    shuffle=True,
    drop_last=True,
    num_workers=0,
)
checkpoint_callback = ModelCheckpoint(
    monitor='train_loss',
    save_top_k=-1,
    dirpath = './Output/',
    filename = '{epoch}',
    every_n_epochs=1,
    save_weights_only=True,

)

accelerator = "gpu" if torch.cuda.is_available() else "cpu"
trainer = pl.Trainer(max_epochs=500, devices=1, accelerator=accelerator,callbacks=checkpoint_callback,)
trainer.fit(model=model, train_dataloaders=dataloader)`

Snipaste_2024-04-09_16-23-23

I've searched many places and can't find the reason, please tell me why this is, thanks

guarin commented 3 months ago

Hi, the checkpoint size should stay pretty much the same size over multiple epochs as the same number of weights are stored in the checkpoint. The size of the checkpoint should only change if you add/remove layers from your model during training. However, the values stored inside the weights should change. You can verify this by loading two checkpoints and comparing the state_dict in them.

ad5014858 commented 3 months ago

Hi, the checkpoint size should stay pretty much the same size over multiple epochs as the same number of weights are stored in the checkpoint. The size of the checkpoint should only change if you add/remove layers from your model during training. However, the values stored inside the weights should change. You can verify this by loading two checkpoints and comparing the state_dict in them.

Thank you very much for your reply, it solved my doubts