microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.55k stars 3.82k forks source link

Can't read old models saved with saveRDS.lgb.Booster on version 4.3.0 of lightgbm #6560

Open BastienFR opened 2 months ago

BastienFR commented 2 months ago

Description

I have a long term project that uses lightgbm for the modeling. I made a bunch of models throughout the years which I access again from time to time to predict on new data. I used to save my models with lightgbm::saveRDS.lgb.Booster() and read them with lightgbm::readRDS.lgb.Booster(). I’m trying to run the code with lightgbm version 4.3.0, and now my code fails telling me the lightgbm::readRDS.lgb.Booster() doesn’t exist anymore:

image002

m_control_1 <- readRDS.lgb.Booster(here::here("results/models/wind_model_m_control_1.rds"))
# Error in readRDS.lgb.Booster(here::here("results/models/wind_model_m_control_1.rds"))
#    could not find function "readRDS.lgb.Booster"

I’ve found in this issue (https://github.com/rstudio/bundle/issues/55) that the function was remove and that we should use readRDS instead. When I do so, the model is loaded with no error:

image003

m_control_1 <- readRDS(here::here("results/models/wind_model_m_control_1.rds"))

However, the object created is unusable:

image004

m_control_1
# LightGBM Model
# (Booster handle is invalid)

var <- jsonlite::fromJSON(m_control_1$dump_model())$feature_names
# Error in m_control_1$dump_model() :
#   Attempting to use a Booster which no longer exists and/or cannot be restored. This can happen if you called Booster$finalize() or if this Booster was saved through saveRDS() using 'serializable=FALSE'.

Is there a way to load old model created with lightgbm::saveRDS.lgb.Booster() back into the new version of lightgbm? I guess either by making readRDS works with files created with saveRDS.lgb.Booster() or add back the lightgbm::readRDS.lgb.Booster() into the package for this kind of use cases?

Rerunning all my old models would be difficult and downgrading my version of lightgbm do not sound ideal either.

Reproducible example

See print screen above. Sorry I work in a close environment and can't export objects or code.

Environment info

LightGBM version 4.3.0 install from CRAN, working in R 4.4.0 on a windows machine

jameslamb commented 2 months ago

Thanks for using LightGBM.

The answer to this question will depend on your answers to those questions.

And can you please edit the description as follows?

BastienFR commented 2 months ago

Thanks for your response.

what version of R was the model file created with? what version of LightGBM?

Both those questions are hard to answer as I don't keep track of the version of neither R or my packages when I run a code. The objects date from before 2023, therefore I can expect R to have been around version 4.2. and lightgbm at version 3.3.. It's the best I can do, sorry.


About the formatting, sorry about the doubling of the first sentence, it's a copy-paste error that happened after I reread my post. As for the screenshots, I don't want to get into an argument here. I know my question wasn't up to standard. Print screens are to avoid and reproducible example that are easily copy-pasted and machine readable is what to aim for. However, I work in a corporate environment that as no other way out but screenshots. It was pretty much that or nothing. Making a reproducible example would imply building two different environments, one old to save a model and one new to read it, which is time consuming. I (wrongly?) expected my question to be relatively strait forward and simple and a direct consequence of deprecating readRDS.lgb.Booster(). Feel free to close this issue if you disagree with me or if the description and answers I provided you are not sufficient to understand the problem.

jameslamb commented 2 months ago

Both those questions are hard to answer as I don't keep track of the version of neither R or my packages when I run a code.

Ok thanks. When I have time to look at this again, I'll try to create a reproducible example for you using R 4.2.0 and {lightgbm} 3.3.1.

If you plan on using .rds for long-term storage of any R objects, you should keep track of this information. It is possible that a future version of R will not be able to read .rds objects produced by an older version.

See:

I work in a corporate environment that as no other way out but screenshots

It's fine. I manually retyped the code from your screenshots. It took me just 30 seconds, and now this issue will be discoverable by others facing the same problem and searching for those error messages.