project-lighter / lighter

Config-based framework for organized and reproducible deep learning. MONAI Bundle + PyTorch Lightning.
https://project-lighter.github.io/lighter
MIT License
22 stars 2 forks source link

Enhance Dataloader Configuration #60

Open surajpaib opened 1 year ago

surajpaib commented 1 year ago

🚀 Feature Request

A lot of dataloader arguments are mentioned in system parameters. For example, batch_size, drop_last_batch.

Would be good to have a way to set other parameters of the dataloader such as prefetch_factor, persist_workers and potentially other future additions to this.

🛰 Alternatives

Maybe we can add a partial dataloader to the system config? and give it dataset and sampler later?

ibro45 commented 7 months ago

Discussed with @john-zielke-snkeos:

dataloaders:
    train:
        batch_size: 4
        num_workers: 8

and the rest was already set by default. If a user needs a completely different DataLoader, they can go ahead and define _target: ..., but ensure thae the other default args aren't given to it in that case.

surajpaib commented 7 months ago

@ibro45 Agree with this.

This would now bring us into the territory of templates where we set some default object for dataloaders.

If we do this for data loaders and there is a default expected behaviour for it that our user can expect, should we not do this for other items in the config as well?

For instance, trainer can be defaulted to pytorch_lightning.Trainer with benchmark=True, precision=16-mixed, etc.

surajpaib commented 7 months ago

We can also extend this templating and have several templates for different workflows.

Say we want to have a classification workflow, we can set templates for a few different models and losses. We can set up a simple CLI interface for the user to generate a configuration that provides selection between these different templates and spits out a final config that they just have to configure their data for (These templates won't get assigned by default unlike the dataloaders)

We can use something like Cookiecutter (https://github.com/cookiecutter/cookiecutter) to map user CLI to pre-set templates

This will be a separate feature ofcourse and should go in a separate issue if we agree to do it but templating can provide us a lot of extra features without comprising on the dynamicism of the library

ibro45 commented 7 months ago

Seems like pydantic could be the way to go in this case. I will attempt to refactor it sometime soon, hopefully, over the weekend. This combo of pydantic and MONAI Bundle will somewhat resemble Hydra's integration with Data Classes.

Let's discuss the defaults in the future PR.