Closed schelmi1 closed 6 months ago
Hi and thanks for using Lightly!
I guess this is not explained well in the docs but you don't have to use a LightlyDataset
, you can use your CustomImageDataset
instead. Just make sure to set the transforms :)
thank you, so it seems to be a pytorch_lightning problem then as it does not work with the CustomImageDataset aswell
Maybe try setting num_workers=0
to see if it works when the dataset is only in the main process. If this is the case it might be an issue related to starting the dataloaders as they have to copy the dataset to every worker process. You should be able to test this even outside pytorch lightning with:
dataset = CustomImageDataset(...)
dataloader = DataLoader(dataset)
for batch in dataloader:
print("got batch")
finally got it!
im running it from command line as .py-file out of if __name__ == "__main__"
works with an arbitrary realistic number of workers and persistent_workers=True
in dataloader args
its also extremely fast (compared to your tutorial with loading from disk, probably because its loading directly from RAM!?)
Great that you got it working! I'll close the issue for now.
its also extremely fast (compared to your tutorial with loading from disk, probably because its loading directly from RAM!?)
Yes, loading from RAM is usually much faster than from disk.
Hi,
i'am playing around with the SimCLR tutorial and trying to train on a custom dataset class using LightlyDataset.from_torch_dataset(). Using the MNIST handwritten digits dataset with LightlyDataset(input_dir=path_to_images) everything works fine and it starts training and finishes without any issues.
With my custom pytorch dataset class im loading all the files and labels from a zip file to ram, then using
if i compare
dataset_train_simclr_custom.dataset.__getitem__(1)
and
dataset_train_simclr.dataset.__getitem__(1)
, it looks the same:i put datasets into the dataloader like in the tutorial
However with the custom dataset it never starts training after calling
trainer.fit(model, dataloader_train_simclr)
in contrast to the dataset created from just passing the folder pathRAM is never full, i already tried way less images in the dataset aswell
my custom dataset class looks like this:
any hint is appreciated