openkim / kliff

KIM-based Learning-Integrated Fitting Framework for interatomic potentials.
https://kliff.readthedocs.io
GNU Lesser General Public License v2.1
34 stars 20 forks source link

Trainer framework #178

Closed ipcamit closed 1 month ago

ipcamit commented 2 months ago

I dont expect this PR to be immediately merged. But I think I can use a fresh pair of eyes for the trainer. Now it has three trainers, KIM trainer, DNN trainer for descriptors, and GNN trainer using torch lightining. I think at this point I can really use comments and recommendations.

ipcamit commented 2 months ago

I will go through the comments in coming week, as I get some time (currently working on making the Mac package for libdescriptor so all tests can pass). But regarding your major questions:

Can we merge torch_trainer.py and lightning_trainer.py, to use lightning functions and avoid recreating the wheels?

The reason for two separate trainers is that Lightning can get bit rigid at times, and there might be more complicated workflows. For example, EMA updates of weights needs explicitly uploading them to GPU, as Lightning did not have support for it yet. In such cases having a pure torch trainer might be useful. For large models, Lightning is essential, but for small, more innovative research issues, like say TorchMD-net etc, torch based trainer will be better on the side.

For models, we support three modes, kim, tar, and graph. If I understand correctly, tar is just a local tarball of the kim mode. So can we merge these two?

Yes, tar is just KIM model which is tarball. It can also be a ML model, in which case we will pluck out the torchscript wile and use that as model. So in that sense tar = KIM physics + KIM ML models. But you are right, I think I can merge them such that it will treat tarballs are ML or KIM model based on files inside.

ipcamit commented 2 months ago

I have added some changes as requested. Some are pending for next iteration (along with additions to torch lightning). I would like to discuss any changes regarding the API and design. As if that is finalized, I can start cleaning up the code and making it more consistent.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 9.75855% with 897 lines in your changes missing coverage. Please review.

Project coverage is 53.93%. Comparing base (9ee3ce8) to head (b7a157f).

Files Patch % Lines
kliff/trainer/base_trainer.py 0.00% 263 Missing :warning:
kliff/trainer/torch_trainer.py 0.00% 193 Missing :warning:
kliff/trainer/lightning_trainer.py 0.00% 173 Missing :warning:
kliff/models/kim.py 13.26% 85 Missing :warning:
kliff/trainer/kim_trainer.py 0.00% 79 Missing :warning:
kliff/dataset/dataset.py 41.12% 63 Missing :warning:
kliff/utils.py 22.22% 14 Missing :warning:
.../configuration_transforms/graphs/generate_graph.py 56.52% 10 Missing :warning:
kliff/dataset/weight.py 64.28% 5 Missing :warning:
kliff/trainer/kim_residuals.py 0.00% 5 Missing :warning:
... and 3 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## v1 #178 +/- ## ========================================== - Coverage 63.14% 53.93% -9.21% ========================================== Files 50 56 +6 Lines 4718 5699 +981 ========================================== + Hits 2979 3074 +95 - Misses 1739 2625 +886 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ipcamit commented 1 month ago

Added fresh PR as per discussed. Only containing the lightning trainer for now. Will add KIM next. Removing this PR