meteofrance / py4cast

Weather forecasting with Deep Learning
11 stars 10 forks source link

Adding Datamodule #66

Closed CorentinSeznec closed 1 month ago

CorentinSeznec commented 1 month ago
A669015 commented 1 month ago

Good job ! I just wonder if we should have one big datamodule that manages all datasets, or one datamodule per dataset ? I think that if we want to use lightning CLI, it uses the second options. @A669015 what do you think ?

Yes that is the philosophy behind the Datamodule to be associated to a Dataset. I guess a "big" generic Datamodule could be used if the datasets are transparently replaceable each other (finally that is the philosophy of py4cast) . Is there for some datasets the requirement of dataset-specific parameters, that could then make less nice the usage of the lightning CLI with a generic datamodule ?

colon3ltocard commented 1 month ago

Good job ! I just wonder if we should have one big datamodule that manages all datasets, or one datamodule per dataset ? I think that if we want to use lightning CLI, it uses the second options. @A669015 what do you think ?

The dataset name being a str arg to the datamodule I think even as it is the CLI would introspect and be able to pass the argument from the cli to the datamodule and thus dataset selection would work.

https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced_3.html#multiple-models-and-or-datasets

data:
  class_path: py4cast.lightning.plDataModule
  init_args:
    dataset: titan
...

But it is probably more flexible to have one DataModule subclass per dataset.