Training output format help

gopalaniyengar commented 2 years ago

Upon running the following script for training Barlow Twins on a custom dataset in a Colab Notebook:

!python3 main_pretrain.py \ --dataset custom \ --data_dir /content/drive/MyDrive/ \ --train_dir /content/drive/MyDrive/train/ \ --val_dir /content/drive/MyDrive/test/ \ --dali \ --brightness 0.4 \ --contrast 0.4 \ --saturation 0.2 \ --hue 0.1 \ --gaussian_prob 0.0 \ --solarization_prob 0.0 \ --backbone resnet18 \ --max_epochs 1000 \ --devices 0 \ --accelerator gpu \ --num_workers 2 \ --optimizer sgd \ --grad_clip_lars \ --eta_lars 0.02 \ --exclude_bias_n_norm \ --scheduler warmup_cosine \ --lr 0.01 \ --weight_decay 1e-4 \ --batch_size 64 \ --name barlow-twins \ --project SSL \ --wandb \ --save_checkpoint \ --method barlow_twins \ --proj_hidden_dim 2048 \ --proj_output_dim 2048 \ --scale_loss 0.1

the cell displays an output like this:

I would appreciate some insight into what exactly is going on, majorly why the training and validation steps' progress counter are intertwined, and what the entity 'v_num' represents.

vturrisi commented 2 years ago

Hey. We never run anything on colab, so take what I say with a grain of salt. For the "intertwined" behavior, they are not actually intertwined as each epoch contains the validation as the last batches in the progress bar, i.e., the last batches of the 243 counter are validation ones. V_num means nothing as far as I know and it's just some pytorch lightning thing. Check what changes you need to do to properly display the tqdm progress bars on colab.

vturrisi commented 2 years ago

Feel free to reopen the issue if you have any more questions!

gopalaniyengar commented 2 years ago

I have a few more basic questions, could you please clarify if my understanding is correct:

--auto_umap will not function without class information.
With a training and validation dataset separated by class (subdirectories like train/class1/... , train/classn/... , val/class1/... , val/classn/..., the label information should be used in nothing but the online UMAP and online linear evaluation for both the train and validation set.
Online UMAP and online linear evaluation should work even without a validation set specified.
From the base method implementation, I gather that the backbone's output at each step are fed both into the linear classifier and the projector, and the backprop is performed both using the projector loss (eg. Barlow Loss) and the classifier loss. Does this not subvert the point of SSL, where a pre-text task (feature invariance in augmentation) is the objective for a downstream task (classification, segmentation). Here, the classification task is being performed during training, which may induce downstream task specific biases in the model. Am I understanding this correctly?

Suggestion: Normalize Barlow Loss by dimension of layer where loss is calculated? For better comparison of performance using different representation dimensions.

Thank you!

gopalaniyengar commented 2 years ago

@vturrisi I am unsure how to re-open this issue, seems I do not have the privileges? Thank you

gopalaniyengar commented 2 years ago

Also, Is there any way to enable storage of model weights on WandB as artifacts?

vturrisi commented 2 years ago

Hi,

--auto_umap will not function without class information.

Yes, indeed it wont work without a validation dataset because we only implemented it withon_validation_end`. You can change this behavior by making the umap work on training data.

With a training and validation dataset separated by class (subdirectories like train/class1/... , train/classn/... , val/class1/... , val/classn/..., the label information should be used in nothing but the online UMAP and online linear evaluation for both the train and validation set.

Online UMAP and online linear evaluation should work even without a validation set specified.

Online linear evaluation is mostly useful for evaluating the downstream performance on the validation dataset, and not the training one. However, I still think that if you do not specify a validation dataset, you will get the classification accuracy on the training dataset.

From the base method implementation, I gather that the backbone's output at each step are fed both into the linear classifier and the projector, and the backprop is performed both using the projector loss (eg. Barlow Loss) and the classifier loss. Does this not subvert the point of SSL, where a pre-text task (feature invariance in augmentation) is the objective for a downstream task (classification, segmentation). Here, the classification task is being performed during training, which may induce downstream task specific biases in the model. Am I understanding this correctly?

I'm not sure what you mean exactly, but the classification loss doesn't backprop through the backbone, as we have a detach() operation.

Also, Is there any way to enable storage of model weights on WandB as artifacts?

Not via any parameters, but you can probably work around that by passing the wandb logger to the checkpointer and saving the checkpoints there.

gopalaniyengar commented 2 years ago

the classification loss doesn't backprop through the backbone, as we have a detach() operation Thanks for clarifying. Not via any parameters, but you can probably work around that by passing the wandb logger to the checkpointer and saving the checkpoints there. I will try this. I understand now, thanks a lot.

vturrisi / solo-learn

Training output format help #272