leap-stc / ClimSim

An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators.
https://leap-stc.github.io/ClimSim/
Apache License 2.0
140 stars 40 forks source link

Adding PyTorch support in data_utils #79

Closed jackmiller2003 closed 8 months ago

jackmiller2003 commented 8 months ago

In this pull request, I add the option of using PyTorch for the data_utils class. This should allow users to get out a PyTorch dataset via the load_ncdata_with_generator function.

The following changes occurred:

  1. Modification of data_utils.py where one can choose an ml_backend (currently either PyTorch or Tensorflow, defaulting to Tensorflow) which will be used in load_ncdata_with_generator and thus elsewhere in the class.
  2. Modification of setup.py where one can now set an environment variable to install PyTorch over Tensorflow. Note that one can run setup.py naively as usual and it will install Tensorflow.
  3. Creation of testing_data_utils_with_backends.py which is a small script to test that one can indeed use the backends correctly and that one can still save things to NumPy arrays.

I was also informed that a new testing framework is coming -- I am happy to do another PR at that time with proper testing. For now, I have tested the logic in data_utils.py with the script testing_data_utils_with_backends.py (including a comparison of output arrays) and the changes to setup.py via the use of different virtual environments.