vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.38k stars 181 forks source link

Knn doesn't run #373

Closed dr4thmos closed 7 months ago

dr4thmos commented 7 months ago

Describe the bug Running main_knn.py gives that error: ` Exception has occurred: TypeError _load_state() got multiple values for argument 'checkpoint' File "/.../main_knn.py", line 132, in main model = METHODS[method_args["method"]].load_from_checkpoint( File "/.../main_knn.py", line 184, in main()

TypeError: _load_state() got multiple values for argument 'checkpoint'`

To Reproduce

main_knn.py --dataset rgz --train_data_path ../RGZ-D1-smorph-dataset --val_data_path ../RGZ-D1-smorph-dataset --pretrained_checkpoint_dir ./trained_models/byol/ddn5f8w5 --k 1 2 5 10 20 50 100 200 --temperature 0.01 0.02 0.05 0.07 0.1 0.2 0.5 1. --feature_type backbone projector --distance_function euclidean cosine

I'm using custom dataset, but with cifar-10 doesn't work too.

Versions Using repository with tag 1.0.7 torch==1.13.1 torchvision==0.14.0 pytorch-lightning==2.0.2 (tried even with 2.0.9 and 2.1.1)

Additional comments Ty for the repo, it's very useful ^^ I tried to debug a little the issue, but without success.

vturrisi commented 7 months ago

Hey. The issue is probably that the json file has a checkpoint variable here: https://github.com/vturrisi/solo-learn/blob/d27c7130d19035c0ba0af8f90217e78d8ebe7f48/main_knn.py#L127C5-L133. Can you print method_args and share it with me?

dr4thmos commented 7 months ago

Yes there is: 'checkpoint': {'enabled': True, 'dir': 'trained_models', 'frequency': 25, 'keep_prev': False} Probably the one saved from the pretrain config file: checkpoint: enabled: True dir: "trained_models" frequency: 25

Popping out that key, value pair and run again it raise: Traceback (most recent call last): File "", line 1, in File "/home/tcecconello/miniconda3/envs/sololearn/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1543, in load_from_checkpoint loaded = _load_from_checkpoint( File "/home/tcecconello/miniconda3/envs/sololearn/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 91, in _load_from_checkpoint model = _load_state(cls, checkpoint, strict=strict, kwargs) File "/home/tcecconello/miniconda3/envs/sololearn/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 144, in _load_state obj = cls(_cls_kwargs) TypeError: BYOL.init() missing 1 required positional argument: 'cfg'

vturrisi commented 7 months ago

Okay. I think it might be some lightning change. I'll try to fix it later this week.

vturrisi commented 7 months ago

@dr4thmos sorry for the delay, I'm still quite busy this week as well. Can you share your checkpoint? Then it's easier for me to experiment.

dr4thmos commented 7 months ago

args.json @vturrisi No problem, there's no hurry. Here the args.json, do you need the model checkpoint?

vturrisi commented 7 months ago

@dr4thmos sorry for the mega long delay. Can you check if #376 fixes it?

EDIT: just tried it and it fixes the issue. I'll merge with main, let me know if you still have any issues.