Open moskomule opened 5 years ago
Hi, Thanks for opening the issue. I'll have a look at this
Thank you. These days I found image preprocessing parts are the bottlenecks. I'll try DALI by myself and report how it will make the processing fast.
albumentations is also a contender for faster image augmentation.
In my experience IO is actually worse than a "slow pre-processing" library. SSDs and NVMes(!) help a lot.
Hi @datumbox it's been a while since this PR had any discussions, I'm curious if there are any plans to make this happen?
@msaroufim we are currently working to improve the Data loading process using PyTorch Data. We do not have immediate plans for integrating DALI directly at the moment but we can review this on the future. As we have very little resources, I think it's more realistic that such an investigation can happen after the release of the new Datasets API.
ccing @NicolasHug and @pmeier who lead the work on datasets.
Oh interesting so the way you'd integrate new backends in the future is to integrate them within torch.data
? Also where can I learn more about the new Datasets API?
cc @VitalyFedyunin @ejguan @wenleix
Oh interesting so the way you'd integrate new backends in the future is to integrate them within
torch.data
?
Not sure what you mean by "backends" here. In general you are right though. torchdata
is the way to go for the new datasets.
Also where can I learn more about the new Datasets API?
There is no public document yet. However, we already have quite a large collection of datasets ported to the new structure. You can access them with torchvision.prototype.datasets.load(name)
, where name
is the name of the dataset you want to load. For example
from torchvision.prototype import datasets
dataset = datasets.load("voc")
The dataset
object is a regular IterDataPipe
defined by torchdata
. To transform it you can use the .map
method. It takes a callable that will be executed for each sample in the dataset. This sample will be a dictionary with str
keys. For example, a simple data pipeline could look like this:
from torchvision.prototype import transforms
transform = transforms.Compose(
transforms.DecodeImage(),
transforms.Resize(256),
transforms.CenterCrop(256),
)
for sample in dataset.map(transform):
...
For everything else, please also have a look at the torchdata
documentation.
Adding to @pmeier's comment, this tutorial might help you.
@pmeier to clarify by backend I mean one of these https://github.com/pytorch/vision#image-backend - i.e: pillow, accimage, pillow simd etc..
Overall the new interface for adding datasets looks good but I'm more curious about adding new backends like DALI. In particular DALI has some accelerated image processing kernels, accelerated image decoding which I think would be very useful to integrate in vision directly, feels too domain specific to be in torch.data IMHO and is similar enough to other backends like accimage to be in vision. What's the process like for adding a new backend? If it's similar to the one for accimage https://github.com/pytorch/vision/blob/main/torchvision/transforms/functional.py#L13 I can make a PR for this
The other option is to integrate the DALI data loader as a data pipe in torch.data
Here's a good primer on DALI and its value proposition https://cceyda.github.io/blog/dali/cv/image_processing/2020/11/10/nvidia_dali.html
@VitalyFedyunin @wenleix please chime in on where you think the most natural place for a DALI integration is
The other option is to integrate the DALI data loader as a data pipe in torch.data
Thanks @msaroufim, I had the same feeling about making it as a separate DataPipe because it requires different behavior compared with datapipe.map
like making sure this DataPipe only run on single process to prevent cuda context being copied around. It definitely needs more deeper look on DALI itself.
Seems like there's a good workaround too https://github.com/NVIDIA/DALI/issues/3081#issuecomment-866239816 - I'll take a more thorough look
@msaroufim
to clarify by backend I mean one of these https://github.com/pytorch/vision#image-backend - i.e: pillow, accimage, pillow simd etc..
The new datasets will return a features.EncodedImage
, which is a 1D uint8 tensor just storing the raw bytes. You can decode it however you want. Right now, transforms.DecodeImage()
uses PIL as backend
but you can use arbitrary backends there.
Similar issue on torchdata repo - https://github.com/pytorch/data/issues/761 Might be good to keep eye on this :)
Hi, any plan to integrate DALI (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/index.html) to
torchvision
for faster preprocessing? I foundchainer
tries to integrate it (https://github.com/chainer/chainer/pull/5067).