Closed bfhealy closed 8 months ago
@Theodlz This is a framework for handling the move from pickle to joblib model files on the NMMA side of things. Perhaps we could add joblib files to the gitlab repo for testing while leaving the pickle files in place for now? I think I remember the only issue with gitlab being individual file size, rather than total file count.
Hi @bfhealy, sorry I didn't see this first mention, but the merging with main sent me an email notification!
Sure thing, I can do that tomorrow morning or later this evening. I'll let you know once this is done!
No worries @Theodlz, thanks for your help!
Hi @Theodlz, just wanted to send a ping on this.
@bfhealy @mcoughlin this looks very good to me. I like the idea of deprecating Zenodo, at least in its current format (one NMMA repo, and not one repo per NMMA model) given that we can't update anything on it anymore anyway.
@mcoughlin @Theodlz For people using the current code, my updates to the gitlab repo will keep things working as intended, since I didn't delete any of the old files. I think a higher level of backward compatibility (e.g. allowing the option of pkl vs joblib format) would be more involved to implement, and might increase the chances of out-of-date models continuing to be used. I agree it's more sustainable to move toward joblib going forward.
Ok I am ok with this going in once ready @bfhealy. Thanks!
@bfhealy as you just said, having both models on gitlab is enough backwards compatibility already. You use the old NMMA it works, you use the new one it works (+ you'll redownload the models, but that's fine). ✅
Let me know if you need help removing Zenodo from the repo. Might need to update the docs too to point to Gitlab but not Zenodo
Sounds good! I'll work on removing zenodo from the repo before merging.
@mcoughlin My only remaining question is about the DOI that points to the zenodo repository in CITATION.cff
. Do you think that should remain as-is for now, or be changed to a url with the gitlab link?
Yeah time to update
This PR modifies NMMA to support joblib model format instead of pickle, owing to the smaller footprint of the joblib files and the ability to implicitly decompress downloaded models using
joblib.load
. Models on gitlab have been updated to be compatible with both the current and incoming code.Zenodo is not yet updated with the joblib models - perhaps we'll want to deprecate its use in favor of gitlab.
Failing tests appear to be from coveralls rather than the pytest run.