microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.74k stars 3.84k forks source link

[R-package] use keyword arguments for all internal function calls #3390

Closed jameslamb closed 4 years ago

jameslamb commented 4 years ago

Summary

Today in the R package, there are a lot of internal function calls which use only positional arguments. Change them to use keyword arguments for extra safety.

I've added this issue to provide a small, focused contribution opportunity for Hacktoberfest 2020 participants. If you are an experienced open source contributor, please leave this for beginners and consider contributing on a different backlog item.

Motivation

Using positional arguments in internal code is dangerous, because it can allow mistakes to slip through or can lead to confusing errors.

This is especially dangerous for cases where the values you'd pass to two parameters take on similar values. For example, in lightgbm(), eval_freq and early_stopping_rounds are both positive integers and can take on similar values. You might not known, unless you're very careful, that you accidentally specified them in the wrong order.

How to Contribute

To help with this issue, propose a pull request which replaces uses of positional arguments in the R package with keyword arguments.

That means changing calls like this:

# before
folds <- generate.cv.folds(
    nfold
    , nrow(data)
    , stratified
    , getinfo(data, "label")
    , getinfo(data, "group")
    , params
)

# after
folds <- generate.cv.folds(
    nfold = nfold
    , nrows = nrow(data)
    , stratified = stratified
    , label = getinfo(data, "label")
    , group = getinfo(data, "group")
    , params = params
)
  1. Only change code in R-package/R/
  2. Before submitting your PR, re-generate the documentation. Be sure you have {roxygen2} 7.1.1 installed. If you have any issues with this, just ask when you submit your PR and I can regenerate the docs for you
    sh build-cran-package.sh
    R CMD INSTALL --with-keep.source lightgbm_3.*.tar.gz
    cd R-package
    Rscript -e "roxygen2::roxygenize(load = 'installed')"
  3. Do not modify any calls to lgb.call()
  4. Do not worry about any function calls from the standard library (like print(), length(), is.character(), etc.)
  5. If adding these arguments makes lines too long, use comma-first style with one argument per line
    result <- some_function(
        arg1 = TRUE
        , arg2 = 4
        , arg3 = "hello"
        , ...
    )

Refer to #3391 as an example. If you have any questions, tag me on this issue and I can help.

Assignments

~If you are interested, please comment on this issue and indicate which file in R-package/R/ you'd like to help on. PLEASE ONLY TAKE ONE FILE, so that multiple people can use this as a learning experience.~

~This issue is not urgent, so you can claim a file now and not contribute until Hacktoberfest begins on October 1st.~

The list below tracks the assignments and completions so far.

Thanks for contributing to LightGBM!

philip-khor commented 4 years ago

can I claim callback.R please?

iadi7ya commented 4 years ago

Hi @jameslamb , I'm working on lgb.train.R. Thanks!

jameslamb commented 4 years ago

thanks @iadi7ya ! I appreciate you taking the time to contribute to {lightgbm}!

AnshuTrivedi commented 4 years ago

@jameslamb I am working on lgb.Predictor.R

Pey-crypto commented 4 years ago

can i claim lgb.Booster.R?

jameslamb commented 4 years ago

can i claim lgb.Booster.R?

yes please!

zenggyu commented 4 years ago

Hi, @jameslamb , I am working on lgb.model.dt.tree.R.

jameslamb commented 4 years ago

Thanks! Now that Hacktoberfest is over we'd like to finish this task, so feel free to take several files if you'd like.

mfrasco6 commented 4 years ago

Hi, Mr. @jameslamb, I am working on utils.R.

zenggyu commented 4 years ago

Since Hacktoberfest is over, I want to claim all of the rest of files.

jameslamb commented 4 years ago

they're yours, thanks! Please bunch the contributions into one or two pull requests.

georgiazoller commented 4 years ago

I am running a multivariate regression using my own data from my qualtrics survey. In the survey I did a treatment (author vs no author), with outcome being the dependent, and age and party being the independent variables. This is my code: model.1 <- lm(Outcome ~ NoAuthor + PARTY + AGE, data=Party_Lines) summary (model.1)

I then get this error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'y' In addition: Warning message: In storage.mode(v) <- "double" : NAs introduced by coercion

Help

jameslamb commented 4 years ago

@georgiazoller that does not look related to LigtGBM. If it is, please open a new issue instead of commenting on this one.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.