Open TomScheffers opened 2 years ago
There's a cache=<filepath>
parameter for lleaves.Model.compile()
. Does that do what you're looking for? See docs for more info. I looked into supporting pickling a while ago, but the cache parameter seemed like the cleaner solution.
Thanks for your quick response. That should do the job! A nice addition would be to have a @classmethod (lleaves.Model.from_cache) that initializes a model directly from cache, as now you still have to initialize with the model_txt.
Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.
Yeah, you're right, classmethod would be nicer. Currently, what's stored in the cache is an ELF file (on Linux), containing the compiled function. Recreating a lleaves.Model
from the ELF file alone would require storing information about eg the pandas_categoricals
(which is a list of lists of strings) as a static variable into the ELF file, which sounds like a PITA.
I might look into this again at some point maybe. I assume there'll either be some way to enable pickling, or I'll serialize the pandas categorical list somehow or I'll have a "light"-version of pickling, where the model can be pickled but it will not include the compiled function, requiring you to store 2 files (the pickled model, and the ELF cache file).
Love your work on this package. FYI: I get a ~10x speedup compared to the lightgbm.predict method, using a lot of categorical variables.
I'm glad to hear lleaves is working for you! :)
I am loading a model with thousands of trees, which takes approx. 10minutes. Therefore I want to compile the model once and then serialize it to file. Pickle or dill give the following error: "ValueError: ctypes objects containing pointers cannot be pickled". Is there a way to save/load the file to/from disk? Thanks :)