vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.42k stars 185 forks source link

CustomDatasetWithoutLabels doesn't support 0<data.data_fraction<1 #327

Closed kaland313 closed 1 year ago

kaland313 commented 1 year ago

Describe the bug Using a custom dataset without labels (therefore using solo.data.pretrain_dataloader.CustomDatasetWithoutLabels) with 0<data.data_fraction<1 config option results in the following error:

Traceback (most recent call last):                                                       
  File "/workspace/solo-learn/main_pretrain.py", line 146, in main          
    train_dataset = prepare_datasets(                                                                     
  File "/workspace/solo-learn/solo/data/pretrain_dataloader.py", line 355, in prepare_datasets
    data = train_dataset.samples                                                               
AttributeError: 'DatasetWithIndex' object has no attribute 'samples'

As far as I can tell if I use a custom dataset with no_labels: True, DatasetWithIndex is subclassed from CustomDatasetWithoutLabels, so the attribute error is essentially AttributeError: 'CustomDatasetWithoutLabels' object has no attribute 'samples'

To Reproduce In this gist, I provided a modified version of scripts/pretrain/custom/byol.yaml that uses data.datafraction: 0.5. To reproduce the bug, copy this to scripts/pretrain/custom and run:

python main_pretrain.py --config-path scripts/pretrain/custom --config-name byol_datafraction.yaml

Screenshots No screenshots required.

Versions solo-learn == 1.0.6 torch==1.13.1 pytorch-lightning==1.6.4

Additional comments Looking at /solo/data/pretrain_dataloader.py#L353-L364 and the definition of CustomDatasetWithoutLabels the cause of the issue is clear, and can be easily fixed by modifying /solo/data/pretrain_dataloader.py#L353-L364 to use train_dataset.images instead of train dataset.samples if the dataset is an instance of CustomDatasetWithoutLabels. I'll provide such fix in a PR.

vturrisi commented 1 year ago

Closing since the PR seems to fix the issue. Thanks.