Closed gorkamunoz closed 2 weeks ago
Hi there,
Thanks for reporting this bug! Very strange that is taking so long.
Can I ask what versions of python and python packages you are using pip freeze
? Maybe something has broken?
Hi, thanks for the quick answer. I am using a conda environment with python=3.11 and all libraries have been installed when installing disent
from pip yesterday. The disent
version is 0.8.0, torch
is 2.4.0.
This slow behavior seems to be happening only for DSpritesData
and Shapes3dData
. I have tried XYObjectData
, SmallNorbData
and Cars3dData
and for all three getting one batch takes ~0.5 secs. For the other two it takes around 20 seconds. As said, with a typical use of Dataloader with both these datasets I recover normal sampling times, that's why I think something is happening in DSpritesData
and Shapes3dData
or DisentDataset
.
The code I use to get the loader for all these Data classes is:
data = Data Class
dataset = DisentDataset(dataset=data, sampler=SingleSampler(), transform=ToImgTensorF32())
dataloader = DataLoader(dataset=dataset, batch_size=128, shuffle=True, num_workers=1)
replicated your env.
There seem to be issues with parallel processing interactions with the data loader.
If you set num_workers=0
, then the issue is resolved but only loaded with a single worker.
If you keep it as num_workers>0, then It also seems to be much better if you do:
dataloader_itr = iter(dataloader)
for item in dataloader_itr:
pass
Rather than the following, which is very strange.
for item in dataloader:
pass
# OR
for item in iter(dataloader):
pass
Yes, I think there is something weird going on with the parallelization, as having more workers actually leads to slower runs (even on the XYobject data class..).
I have also found a way to solve the problem by setting the in_memory
parameter in the data classes to True
. Is it to be expected?
That is unexpected that in_memory resolved this.
It's possible h5py internals have changed OR some other issue. Will try dive into this a bit.
Are you running into saturation issues feeding your model with a single thread?
EDIT: workers=0
No, I was able to train in all cases, both with workers = 1 or workers > 1
DSpritesData(in_memory=False)
I am not noticing any performance issues after the data loaders are initialised. This is expected from multiple processes with considerable setup
and teardown
time.
DSpritesData(in_memory=False)
Here are benchmarks with tqdm. Initial setup time is still high, but after it gets going it seems fine.
No, I was able to train in all cases, both with workers = 1 or workers > 1
I'm glad it's working for you! Would be very curious to hear about your use case.
Going to close this as resolved if that's alright.
Ok, thank you very much for the quick answers! I have reproduced your tests and I now think that my problems may come because the datasets are store in a secondary drive. I am still trying to understand how this makes increasing the num_workers not to help, but having the dataset in the same drive (as downloaded from your code) makes everything run smoothly.
Ahh, yes this is a fairly large constraint with the original design of the datasets loading.
When I originally built this project I had to run on fairly resource constrained systems (low memory, small GPUs), however, I had SSDs available on these machines with relatively fast disk access. This is why I chose to use the hdf5
backend for the datasets and convert them to this as I wasn't really able to store everything in memory.
Current implementation when using hdf5 reads from disk on every single datapoint access, network drive OR HDD is probably too high latency for in_memory=False
, for decent performance here I would encourage you to use in-memory if you can to get around the high latency of the network drives.
The way I got around this when running experiments on the cluster I had access to was store common data on the network drive, and then init or copy everything into /tmp (on an SSD, not network/HDD) when the job first starts. Getting around network congestion and latency issues and memory limitations on nodes.
Hi! First of all thanks for the awesome package! I have encountered a problem while loading datasets through
DisenDataset
. For example, I am running:Then, getting batches from this dataloader is very slow (e.g. getting one batch using
next(iter(dataloader))
takes more than 10 seconds). I have played a bit with the input parameters but couldn't make it work faster.For instance, I written a custom class that creates a dataloader with similar properties (i.e. that returns dictionaries with the
x_targ
key) as follows:and this one works just fine.
Any clue what could be slowing the dataloaders in
DisentDataset
? Thanks in advance!