nuclear-multimessenger-astronomy / nmma

A pythonic library for probing nuclear physics and cosmology with multimessenger analysis
https://nuclear-multimessenger-astronomy.github.io/nmma/
GNU General Public License v3.0
33 stars 58 forks source link

Switch pickle model format to joblib #290

Closed bfhealy closed 8 months ago

bfhealy commented 10 months ago

This PR modifies NMMA to support joblib model format instead of pickle, owing to the smaller footprint of the joblib files and the ability to implicitly decompress downloaded models using joblib.load. Models on gitlab have been updated to be compatible with both the current and incoming code.

Zenodo is not yet updated with the joblib models - perhaps we'll want to deprecate its use in favor of gitlab.

Failing tests appear to be from coveralls rather than the pytest run.

bfhealy commented 10 months ago

@Theodlz This is a framework for handling the move from pickle to joblib model files on the NMMA side of things. Perhaps we could add joblib files to the gitlab repo for testing while leaving the pickle files in place for now? I think I remember the only issue with gitlab being individual file size, rather than total file count.

Theodlz commented 9 months ago

Hi @bfhealy, sorry I didn't see this first mention, but the merging with main sent me an email notification!

Sure thing, I can do that tomorrow morning or later this evening. I'll let you know once this is done!

bfhealy commented 9 months ago

No worries @Theodlz, thanks for your help!

bfhealy commented 9 months ago

Hi @Theodlz, just wanted to send a ping on this.

Theodlz commented 8 months ago

@bfhealy @mcoughlin this looks very good to me. I like the idea of deprecating Zenodo, at least in its current format (one NMMA repo, and not one repo per NMMA model) given that we can't update anything on it anymore anyway.

bfhealy commented 8 months ago

@mcoughlin @Theodlz For people using the current code, my updates to the gitlab repo will keep things working as intended, since I didn't delete any of the old files. I think a higher level of backward compatibility (e.g. allowing the option of pkl vs joblib format) would be more involved to implement, and might increase the chances of out-of-date models continuing to be used. I agree it's more sustainable to move toward joblib going forward.

mcoughlin commented 8 months ago

Ok I am ok with this going in once ready @bfhealy. Thanks!

Theodlz commented 8 months ago

@bfhealy as you just said, having both models on gitlab is enough backwards compatibility already. You use the old NMMA it works, you use the new one it works (+ you'll redownload the models, but that's fine). ✅

Let me know if you need help removing Zenodo from the repo. Might need to update the docs too to point to Gitlab but not Zenodo

bfhealy commented 8 months ago

Sounds good! I'll work on removing zenodo from the repo before merging.

bfhealy commented 8 months ago

@mcoughlin My only remaining question is about the DOI that points to the zenodo repository in CITATION.cff. Do you think that should remain as-is for now, or be changed to a url with the gitlab link?

mcoughlin commented 8 months ago

Yeah time to update