Is your feature request related to a problem? Please describe.
We currently don't use any num_workers for sub-processes as per torch documentation, resulting in warnings in the console that the validation and the training doesn't have any available.
Adding that variable to both and setting it to 4 for each results in a jump from about 1.40it/s to 2.50it/s, at least in my testing.
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
Describe the solution you'd like
Implement support for num_workers, perhaps as extra fields for both the validator and the trainer in the config.
Additional contextThere is currently one downside:
Doing so will, at least with the current pytorch version, throw multiple warnings in the console (and logs) mentioning the wrong usage of TypedStorage and how it's deprecated.
I have also noticed that with this change it is staying at a steady VRAM usage (with batch size 16 it stays at 14.8GB, with 20 it stays at around 16.8GB).
It drops to a lower usage whenever it's saving the checkpoints though, but that's fine.
I assume this is also part of where the improvements come from since it doesn't have to constantly reload through the main thread / process and can load the new stuff in the background.
Is your feature request related to a problem? Please describe. We currently don't use any
num_workers
for sub-processes as per torch documentation, resulting in warnings in the console that the validation and the training doesn't have any available.Adding that variable to both and setting it to 4 for each results in a jump from about 1.40it/s to 2.50it/s, at least in my testing.
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
Describe the solution you'd like Implement support for
num_workers
, perhaps as extra fields for both the validator and the trainer in the config.Additional context There is currently one downside: Doing so will, at least with the current pytorch version, throw multiple warnings in the console (and logs) mentioning the wrong usage of
TypedStorage
and how it's deprecated.I have also noticed that with this change it is staying at a steady VRAM usage (with batch size 16 it stays at 14.8GB, with 20 it stays at around 16.8GB).
It drops to a lower usage whenever it's saving the checkpoints though, but that's fine.
I assume this is also part of where the improvements come from since it doesn't have to constantly reload through the main thread / process and can load the new stuff in the background.