Closed jameslamb closed 2 years ago
Agree with all the proposed changes, not only this will make it easier to maintain but also make it easier for users to work with. 👍
This work is now complete. See the list of linked pull requests above for details.
Thanks very much @StrikerRUS for thorough reviews of so many PRs!
@jameslamb Thanks a lot for splitting the work into many multiple small PRs! It was a pleasure to review them.
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Summary
The following changes should be made to
lgb.Dataset()
in the R package."deprecated" = "supported, but raises a warning if used".
In release 3.3.0 (#4310)
info
inlgb.Dataset()
group
,weight
,init_score
, andlabel
tolgb.Dataset()
...
inlgb.Dataset()
(https://github.com/microsoft/LightGBM/issues/4226#issuecomment-829473584)Dataset$getinfo()
(with a warning its name will be changed toget_field()
)Dataset$setinfo()
(with a warning its name will be changed toset_field()
)Dataset$get_field(field_name)
toDataset
, matching the Python packageDataset$set_field(field_name, data)
should be added toDataset
, matching the Python packageIn release 4.0.0
...
fromlgb.Dataset()
4874
Dataset$getinfo()
4864
info
fromlgb.Dataset()
4866
Dataset$setinfo()
4854
Motivation
weight
,init_score
, etc. will match keyword args and not be part of...
)init_score
as an argument passed through...
and a differentinit_score
in theinfo
list?"Description
LightGBM training involves some preprocessing like bucketing continuous features into histograms and filtering out unsplittable features. That work is done one time before training begins, in the construction of a
Dataset
object.In addition to the raw data (i.e. features) used, LightGBM
Dataset
objects can also contain the following:label
= an array of values for the target (e.g. 0s and 1s for binary classification)weight
= an array of sample weights, used to tell LightGBM that some samples should be considered more important during traininggroup
= a vector of integers, describing how samples should be grouped together into "query results" (only relevant in the learning-to-rank task)init_score
= a matrix of per-sample initial scores to boost from. This can be used, for example, to start the boosting process from predictions created by another model.References
Dataset
class on the Python side: https://github.com/microsoft/LightGBM/blob/8a90ea3f267a81a529e3f069cc13e0f6320e7989/python-package/lightgbm/basic.py#L1122-L1128Other Notes
Sorry I didn't write this up sooner. Didn't really think of it until I started working on adding deprecation warnings for uses of
...
(e.g. in #4522).@Laurae2 and I have already talked about this privately, although would still like to open this as a Request for Comment (RFC) to give everyone who's interested a chance to voice their opinions.