Open mayer79 opened 3 months ago
@jameslamb could you also add a Python tag? There, we would need two things: translating feature names to int positions, and adding the set of skipped features as additional set.
Question to @jameslamb and others: In Python, the interaction constraints are transformed into a string here, just like any other list of list-like object:
https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py,
def _param_dict_to_str(data: Optional[Dict[str, Any]]) -> str:
"""Convert Python dictionary to string, which is passed to C API."""
if data is None or not data:
return ""
pairs = []
for key, val in data.items():
if isinstance(val, (list, tuple, set)) or _is_numpy_1d_array(val):
pairs.append(f"{key}={','.join(map(_to_string, val))}")
elif isinstance(val, (str, Path, _NUMERIC_TYPES)) or _is_numeric(val):
pairs.append(f"{key}={val}")
elif val is not None:
raise TypeError(f"Unknown type of parameter:{key}, got:{type(val).__name__}")
return " ".join(pairs)
Is it an option to add a snippet like this?
if key == "interaction_constraints":
val = _prep_interaction_constraints(vals, feature_names)
_prep_interaction_constraints()
would do two things:
Both would need access to the list of feature_names, so we would need to either pass feature_names = None
or the Dataset object to _param_dict_to_str()
. Not sure if this the right way to proceed?
could you also add a Python tag?
We don't actually have a python
label but probably should. I've at least edited the PR title for now.
hmmmm @btrotta @guolinke if interaction_constraints
is provided, should features that do not appear be excluded from training?
At first I agreed with @mayer79 's assessment that they should not be excluded, but seeing this docstring makes me think they're intentionally excluded:
Is it an option to add a snippet like this?
I'd prefer that that be done outside of _param_dict_to_str()
, similar to how categorical_feature
is resolved:
I'd support it being a private function, so it can be shared between cv()
and train()
.
I'm generally supportive of everything you've proposed at the top of this issue, thanks for working on it and for adding that context about consistency with XGBoost and scikit-learn!
Let's just please wait to hear from @btrotta or @guolinke about whether omitting features not mentioned in interaction_constraints
was intentional, there might be something from #2884 and #3130
Sorry, I don't remember about that 🤣, I think it was not intentional. We can align with XGBoost.
😂 ok thank you, I don't remember either. Agree then, let's move forward @mayer79
Is it an option to add a snippet like this?
I'd prefer that that be done outside of
_param_dict_to_str()
, similar to howcategorical_feature
is resolved:I'd support it being a private function, so it can be shared between
cv()
andtrain()
.
Thanks for the hint. I will soon draft the function and then we determine where to place it :-).
Interaction constraints are extremely useful in practice. The API in LightGBM has two aspects that could be improved:
[[0], [1, 2]]
. We could also allow feature names as in XGBoost, or in the R-package of LightGBM.[[0]]
, or as[["living_area"]]
.Both aspects help to reduce user mistakes and aligns the API with XGBoost (and partly with Scikit-Learn's histogram gradient boosting implementation).
I can make two PRs: one for the R-package, and one for the Python package.
@jameslamb @btrotta