vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.38k stars 181 forks source link

Error while using `man_umap.py` #340

Closed saifhassan closed 1 year ago

saifhassan commented 1 year ago

Hey @vturrisi @DonkeyShot21

Thank you for nice work.

While running main_umap.py with pretrained model using BYOL method, it gives following error:

Command

python main_umap.py --pretrained_checkpoint_dir trained_models/old/byol/custom/ --batch_size=16 --num_workers 2 --dataset custom --train_data_path ./datasets/custom/train --val_data_path ./datasets/custom/val

Error

Traceback (most recent call last):
  File "main_umap.py", line 72, in <module>
    main()
  File "main_umap.py", line 45, in main
    .load_from_checkpoint(ckpt_path, strict=False, **method_args)
  File "/home/user1/anaconda3/lib/python3.7/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
    model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
TypeError: _load_model_state() got multiple values for argument 'checkpoint'

Please guide.

vturrisi commented 1 year ago

@saifhassan please don't tag everyone when posting new issues. The main contributors are myself and @DonkeyShot21 and we always try to address issues as soon as possible. Which files do you have inside this folder? main_umap.py expects a folder in the same structure as the checkpoints that we save during training, so a checkpoint file and an args file. It looks like you have more than one checkpoint file.

saifhassan commented 1 year ago

@saifhassan please don't tag everyone when posting new issues. The main contributors are myself and @DonkeyShot21 and we always try to address issues as soon as possible. Which files do you have inside this folder? main_umap.py expects a folder in the same structure as the checkpoints that we save during training, so a checkpoint file and an args file. It looks like you have more than one checkpoint file.

sorry for tagging everyone, updated tagged persons. Yeah, I got it regarding umap. I will let you know if any more queries.

Thanks

saifhassan commented 1 year ago

@saifhassan please don't tag everyone when posting new issues. The main contributors are myself and @DonkeyShot21 and we always try to address issues as soon as possible. Which files do you have inside this folder? main_umap.py expects a folder in the same structure as the checkpoints that we save during training, so a checkpoint file and an args file. It looks like you have more than one checkpoint file.

I am running umap.sh file which contains command as follows:

python3 main_umap.py \
    --dataset custom \
    --train_data_path ./datasets/raf-db/train \
    --val_data_path ./datasets/raf-db/val \
    --batch_size 16 \
    --num_workers 10 \
    --pretrained_checkpoint_dir ./trained_models/byol/zu2661zo

and ./trained_models/byol/zu2661zo directory contains .json and .ckpt files. that's all.

Still same error.

vturrisi commented 1 year ago

How many files do you have in that folder?

saifhassan commented 1 year ago

How many files do you have in that folder?

just two files one .json and one .ckpt

saifhassan commented 1 year ago

only the files generated by main_pretrain.py (one ckpt and one json).

2hindas commented 1 year ago

I am trying to use KNN evaluation, and the exact same error shows up. In my case, I also only have a folder containing just a checkpoint file and an args.json file.

davidrzs commented 1 year ago

Exactly the same issue. Tried to debug, but I am somewhat stuck why this would suddenly appear.

Seems like in the saving.py (lines 177ff) provided by pytorch_lightning one can circumvent the issue as:

    if issubclass(cls, pl.LightningDataModule):
        return _load_state(cls, checkpoint, **kwargs)
    if issubclass(cls, pl.LightningModule):
        kwargs.pop('checkpoint')
        return _load_state(cls, checkpoint, strict=strict, **kwargs)
    raise NotImplementedError(f"Unsupported {cls}")

the kwargs.pop('checkpoint') partially solves my problem. Though I find it weird that this is suddenly an issue.

vturrisi commented 1 year ago

I'll take a look at it in the next few days. Thanks for reporting.

vturrisi commented 1 year ago

344 will fix this issue.