It would be nice to support different modalities, not only vision but it is not a strict requirement. If it is very complicated to make a general wrapper, we should opt for supporting vision at first and search for a general solution later on (one reason being that pytorch audio and text are in flux right now).
As discussed in https://github.com/skorch-dev/skorch/issues/524 it is desirable to be able to tune the transform parameters of dataset transformations like the ones from torchvision using parameter searches. For this we should provide a wrapper as shown in https://github.com/skorch-dev/skorch/issues/524#issuecomment-533883353.
Usage example:
It would be nice to support different modalities, not only vision but it is not a strict requirement. If it is very complicated to make a general wrapper, we should opt for supporting vision at first and search for a general solution later on (one reason being that pytorch audio and text are in flux right now).