vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.42k stars 185 forks source link

Linear Evaluation is too slow #368

Closed LightingMc closed 1 year ago

LightingMc commented 1 year ago

Describe the bug Not sure what the issue is. It is really slow though.

To Reproduce import torch import torch.nn as nn from pytorch_lightning import Trainer from pytorch_lightning.loggers import WandbLogger from pytorch_lightning.callbacks import LearningRateMonitor from torchvision.models import resnet18 from omegaconf import OmegaConf from solo.methods.linear import LinearModel # imports the linear eval class from solo.data.classification_dataloader import prepare_data from solo.utils.checkpointer import Checkpointer

from pytorch_lightning.plugins.training_type import DDPPlugin

from pytorch_lightning.plugins import DDPPlugin

from pytorch_lightning.strategies import DDPStrategy as DDPPlugin

import glob from solo.methods.base_simple import model

kwargs = OmegaConf.create({ "name":"TestingSetUp", "num_classes": 10, "cifar": True, #?? "max_epochs": 100, "optimizer": "sgd", "precision": 16, #?? "lars": False, #?? "lr": 0.1, "exclude_bias_n_norm_lars": False, #?? "gpus": "0", #?? "weight_decay": 0, "extra_optimizer_args": {"momentum": 0.9}, "scheduler": "step", "lr_decay_steps": [60, 80], "batch_size": 128, "num_workers": 4, "pretrained_feature_extractor": "SoloLearn-Weights/Weights-CIFAR10-SoloLearn/BT/barlow-cifar10-otu5cw89-ep=999.ckpt" })

conf=OmegaConf.create({ "data": { "num_classes":10 },

    "max_epochs":100,

    "optimizer":
    {
        "name":"sgd",
        "batch_size":128,
        "lr":0.1,
        "weight_decay":0,
        "kwargs" :{},
        "exclude_bias_n_norm_wd":False,
        "exclude_bias_n_norm_lars":False,
        "layer_decay":0.0,
        "lars":False,
        "precision":16,
        "num_workers":4
    },

    "scheduler":
    {
        "name" :"step",
        "min_lr" : 0.0,
        "warmup_start_lr" : 3e-5,
        "warmup_epochs" : 10,
        "lr_decay_steps" : [60,80],
        "interval" : "step",
    },

    "finetune" : False,

    "performance":
    {
        "disable_channel_last":False
    },

    #loss_func:,
    "accumulate_grad_batches":1,
    "pretrained_feature_extractor": "SoloLearn-Weights/Weights-CIFAR10-SoloLearn/BT/barlow-cifar10-otu5cw89-ep=999.ckpt"
}

)

backbone=model()

model = LinearModel(backbone, conf) model=model.cuda() train_loader, val_loader = prepare_data( "cifar10", train_data_path=None, val_data_path=None, batch_size = conf.optimizer.batch_size, num_workers = conf.optimizer.num_workers, )

wandb_logger = WandbLogger( name="linear-cifar10-Barlow", # name of the experiment project="self-supervised", # name of the wandb project entity=None, offline=False, ) wandb_logger.watch(model, log="gradients", log_freq=100)

callbacks = []

automatically log our learning rate

lr_monitor = LearningRateMonitor(logging_interval="epoch") callbacks.append(lr_monitor)

checkpointer can automatically log your parameters,

but we need to wrap them in a Namespace object

from argparse import Namespace

args = Namespace(kwargs)

args = kwargs

saves the checkout after every epoch

ckpt = Checkpointer( args, logdir="checkpoints/linear", frequency=1, ) callbacks.append(ckpt)

trainer = Trainer.from_argparse_args( args, logger=wandb_logger,# if args.wandb else None, callbacks=callbacks, plugins=DDPPlugin(find_unused_parameters=False), checkpoint_callback=False, terminate_on_nan=True, max_epochs=100

)

trainer.fit(model, train_loader, val_loader)

Versions Pre-Trained with pl-lighting 2.0.2, evaluating with pl-lighting 1.6

vturrisi commented 1 year ago

Hey. I can't properly understand what is going on in your question. Can you reformat it? Also, what does too slow mean? Did you run the code before and saw a decrease in performance?

vturrisi commented 1 year ago

I'm closing because there's no proper info. Feel free to re-open.

LightingMc commented 1 year ago

I'm sorry for not getting back to you sooner. I was busy. Here is the reformatted code.

Earlier Runs: It was way faster, and i could get my results within an hour.

Now, it takes forever to run. It takes up to 20secs/iteration.

I do have to say that there were issues with pl-lighting 2.0.2 in evaluation and I had to make a separate environment with pl-lighting 1.6.0--ish.