load model only upon first use? - Githubissues

kjappelbaum / oximachinerunner

An easy API for using oximachine.

MIT License

7 stars 5 forks source link

load model only upon first use? #43

Closed ltalirz closed 3 years ago

ltalirz commented 3 years ago

When oximachinerunner is used as a dependency, you will often want to reuse an oximachinerunner instance (in order to avoid reloading the model each time).

I.e. a convenient way to do this is to place the oximachine runner instance at the module level, from where it can then be imported. However, this currently means that already importing the package triggers the download of the models, even if the oximachinerunner is perhaps never used.

Would you be open to make the model, scaler and featureset properties operate on an internal cache that is populated lazily on first use (run_oximachine)?

kjappelbaum commented 3 years ago

However, this currently means that already importing the package triggers the download of the models, even if the oximachinerunner is perhaps never used.

this is a good point, and for which reason we had the automatic_download flag. Currently this is trigged directly upon creation of the object (hence this flag is not so useful), but I could move this to another method that is only called when one of the model-dependent functions is called.

kjappelbaum commented 3 years ago

can't promise that I'll do it today (i want to first include the neural net---which is anyhow much less memory intensive), but should be done by the end of the week

ltalirz commented 3 years ago

I can do it myself as well; just wanted to know whether you would consider moving to this mode

kjappelbaum commented 3 years ago

yea, sure. I don't see a drawback of this right now. I'd just try to make some private method for this download such that we could also call it (i.e., populate the cache) before using some of the private methods. In the oximachinetool for example i use some of the private methods and not the run_oximachine method.

kjappelbaum commented 3 years ago

i'm just think of something like

    def _load_model(self):
        self.model, self.scaler, self.featureset = load_model(
            self.modelname, self.automatic_download
        )
        self.loaded = True

i think this works now as you'd want it to work, i'm just not sure about the featurenames

kjappelbaum commented 3 years ago

Is the new behavior as you expected/wanted?

ltalirz commented 3 years ago

ah thanks!

I might have taken slightly different route [1], but this does the job! e.g. this will speed up the building of the aiida-lsmo docs ;-)

[1] my approach would probably have been slightly different; e.g. adding a self.cache property that does the loading when called (populating self._cache) hand have model, scaler and featureset be @properties that simply take self.cache['model'], ... That way one doesn't have to think about where one needs to load the model first.