microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.53k stars 3.82k forks source link

inf/nan values in model are silently converted to zero when model is round-tripped to file #3941

Closed mjmckp closed 3 years ago

mjmckp commented 3 years ago

When compiled with VS2017, a round trip of a LightGBM model to string (or file) via the GBDT::SaveModelToString and GBDT::LoadModelFromString functions silently replaces all inf/nan values in the model with zeros.

This happens because the GBDT::LoadModelFromString calls CommonC::StringToArray<double> (see below, from line 1105 of utils\common.h) to convert the strings containing various parts of the model (such as the thresholds etc):

template<typename T>
struct __StringToTHelper<T, true> {
  T operator()(const std::string& str) const {
    double tmp;

    // Fast (common) path: For numeric inputs in RFC 7159 format:
    const bool fast_parse_succeeded = fast_double_parser::parse_number(str.c_str(), &tmp);

    // Rare path: Not in RFC 7159 format. Possible "inf", "nan", etc. Fallback to standard library:
    if (!fast_parse_succeeded) {
      std::stringstream ss;
      Common::C_stringstream(ss);
      ss << str;
      ss >> tmp;
    }

    return static_cast<T>(tmp);
  }
};

When the input to this method is "inf", "-inf" or "nan", the second half of the method is meant to handle these cases, but in fact it actually silently sets the return value tmp to zero.

This bug appears to have been introduced in 792c930305a818db1463911c2cbee92d462eaab9 in Dec 2020.

Reproduction

1) Calibrate a LightGBM model which contains a non-finite value (e.g., inf) in one of the trees 2) Save the model to file 3) Load the model from file 4) Inspect the loaded model and observe all the non-finite values have been replaced by zero

Environment info

Windows 10, Visual Studio 2017.

StrikerRUS commented 3 years ago

Fixed via #3942.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.