microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.69k stars 3.83k forks source link

[R-package] Add pkgdown documentation support #1143

Closed Laurae2 closed 5 years ago

Laurae2 commented 6 years ago

I will be adding pkgdown support to the documentation of LightGBM.

You can check the branch here: https://github.com/Laurae2/LightGBM/tree/pkgdown

To add support for pkgdown, it will require the following tasks:

Extras:

I'm sharing my setup for running pkgdown quickly:

# Load useful libraries for development
library(devtools) # install.packages("devtools")
library(roxygen2) # devtools::install_github("klutometis/roxygen")
library(pkgdown) # devtools::install_github("hadley/pkgdown")

# Set the working directory to where I am
setwd("E:/GitHub/LightGBM/R-package")

# Install package
install()

# Generate documentation
document()

# Check for errors
# devtools::check(document = FALSE) # fails/crashes on LightGBM

# Build static website
pkgdown::build_site(run_dont_run = TRUE) # currently fails/crashes

# Remember to cleanup R-package/src folder after testing (keep only install.libs.R, Makevars, Makevars.win)
# Remember to delete R-package/docs after testing

It will look like this:

image

Too much advanced example, gives a good baseline of what we can do with pkgdown: https://keras.rstudio.com/index.html

image

image

Another (very simplified) example (live): https://laurae2.github.io/LauraeDS/

BruceZhaoR commented 6 years ago

@Laurae2 For Travis support, you can see: https://github.com/ropenscilabs/tic#example-applications . I would like to contribute to the LightGBM, but I need some time to get familiar to the package functions.

guolinke commented 6 years ago

@Laurae2 Does pkgdown only work for R package ?

Laurae2 commented 6 years ago

@guolinke pkgdown is only for R, I don't know if there are equivalents existing for Python

BruceZhaoR commented 6 years ago

I guess pkgdown runs just like jekyll, you can see the template, and the function render_template and the whisker package.

If you want to build a Python package html pages, maybe pystache can give you some hints ☺️

guolinke commented 6 years ago

@BruceZhaoR thanks for interesting in contribution. You are very welcome, and feel free to ask us if you met any problems.

Laurae2 commented 6 years ago

Update: pkgdown successfully compiles the website.

I had to do many workarounds to avoid crashes for that but it does work now.

PR coming soon.

Examples:

image

image

image

guolinke commented 6 years ago

@Laurae2 any updates ?

Laurae2 commented 6 years ago

@guolinke Didn't find time to work on it currently, this month I have more free time so I might try to do it again.

BruceZhaoR commented 6 years ago

@Laurae2 I just used pkgdown built my package website: https://brucezhaor.github.io/pjutils/index.html Want to know if you need any help now?

jameslamb commented 5 years ago

@Laurae2 what's the status of this issue? Is it just the Travis item remaining? How is the pkgdown site being published / where is it hosted?

I could take over finishing what's left for this if you can let me know the answers to those few things.

Thanks!

Laurae2 commented 5 years ago

@BruceZhaoR that looks great, do you think you could help to make one LightGBM?

@jameslamb I think the following must re-done from scratch:

Run successfully pkgdown::build_site() on the R-package folder without vignettes Convert demos to vignettes Find workarounds for all function examples which crashes and cannot have a direct or immediate workaround Fix all instances of crashes in vignettes Find workarounds for all vignettes which crashes and cannot have a direct or immediate workaround Run successfully pkgdown::build_site() on the R-package folder with vignettes

For the following:

How is the pkgdown site being published / where is it hosted?

Unless we have a good free hosting somewhere, I can host it.

BruceZhaoR commented 5 years ago

@Laurae2 @jameslamb @guolinke I use the raw lightgbm_r fold and Appveyor to build the package and docs, then push the docs to gh-pages : https://brucezhaor.github.io/lightgbm_r/index.html https://brucezhaor.github.io/lightgbm_r/reference/lgb.plot.importance.html here is the demo repo: https://github.com/BruceZhaoR/lightgbm_r

to-do List

BruceZhaoR commented 5 years ago

https://ci.appveyor.com/project/BruceZhaoR/lightgbm-r/build/artifacts The windows zip(x64) works fine on my computer.

git bash command:

R CMD INSTALL lightgbm_2.2.3.zip
Rscript.exe -e 'library(lightgbm);demo("basic_walkthrough")'

Maybe Windows users can download the zip here.

BruceZhaoR commented 5 years ago

Wait for @Laurae2 #1944

BruceZhaoR commented 5 years ago

@Laurae2 @jameslamb any ideals? I can make a PR.

Laurae2 commented 5 years ago

@BruceZhaoR I think it is better we merge your changes, then I can build on top of it. Otherwise we might duplicate work.

StrikerRUS commented 5 years ago

Is anybody here familiar with conda R environment setup? In case we can manage to build LightGBM R package inside a conda environment, we can build the documentation directly on RTD site along with all other our docs without the need to host GH pages and separate Appveyor job.

I tried to do this today, but without any luck. I installed r-base, r-essentials, make, cmake, m2w64-gcc (I was using Windows) and performed some fixes to make everything work together.

The error after which I abandoned this idea happened on this line of @BruceZhaoR's pipeline. It was

...
cmTC_0b207/fast           0 [main] make 2328 find_fast_cwd: WARNING: Couldn't compute FAST_CWD pointer.  Please report this problem to     the public mailing list cygwin@cygwin.com     /Scripts/make -f CMakeFiles\cmTC_0b207.dir\build.make CMakeFiles/cmTC_0b207.dir/build
...
  The C compiler is not able to compile a simple test program.

and some words about unknown directory, which led me to the following MSYS issue and this fix which seems to be not included in MSYS from conda. Maybe on Linux this will be less painful (RTD's Docker is based on Ubuntu).

jameslamb commented 5 years ago

😱

I have used R conda environment to get around Travis not have great built-in support for MacOS + Python (https://github.com/jameslamb/doppel-cli/blob/master/.ci/setup.sh)

Based on that line, are you saying this is failing on installing LightGBM R package itself? Or some dependency for docs?

I'm willing to help with this πŸ˜€

jameslamb commented 5 years ago

(p.s. here is the background, in case you're wondering "why did James say he uses conda R environment to solve a Python issue")

Laurae2 commented 5 years ago

@jameslamb It seems to be a Windows specific issue with MSYS in conda where compiling fails due to a missing pointer from the compiler (not from the compiled software). @StrikerRUS for RTD can you use R without conda if for Windows or is it required to have conda? Most Windows users will use the standard R installation.

StrikerRUS commented 5 years ago

@jameslamb @Laurae2 Thanks for your replies!

@jameslamb Thanks a lot for your links! I'll try to use CI to find out are any other MSYS issues presented in conda's R environment when I have time for it. Also, any help is appreciated, because it's quite uncomfortable to investigate it with CI, local Linux environment will be easier and faster πŸ˜ƒ .

@Laurae2 I'm talking just about the possibility of building the docs on RTD site directly, not about any CI tests or producing R-artifacts for users from this (ugly conda) R environment. So, we just need to be sure that this environment is sufficient for building docs, which requires building the package itself, and not need to worry about any compatibility issues on users' side.

@Laurae2 RTD provides a few variants for us. Here what we have:

@Laurae2 Speaking about normal standard R installation, I've already told @jameslamb but you don't know yet that I successfully configured R environment at Travis and Appveyor for another project. We can borrow that code for testing purposes of LightGBM R-package. But I think that it falls outside the scope of this issue, we can create another one to discuss this.

StrikerRUS commented 5 years ago

OK, seems that R from conda on Linux is less broken compared to one on Windows. πŸ˜„ So, I had some success in the setup of R docs with the help of pkgdown on RTD site. Everyone who is interested in helping me is welcomed in #2176.

StrikerRUS commented 5 years ago

As requested by @jameslamb, I'm posting here a TO-DO list, which was removed from the source code.

After merging #2176, it is possible to replace build_r_site.R scrip with one single call of pkgdown::build_site(run_dont_run = FALSE, ...)

To do this the following sub-tasks should be done first:

StrikerRUS commented 5 years ago

Finally #2176 has been merged recently! πŸŽ‰

See https://github.com/microsoft/LightGBM/issues/1143#issuecomment-493770556 for actions required to resolve this issue completely.

Now this issue entirely depends on #1944. After rewriting demos it should be rather straightforward replace build_r_site.R file with single pkgdown::build_site() call.

Also, here are some known issues:

  1. UPD: fixed in #2387 ~broken links for sources of the R-API due to that R-package is located in a sub-folder at GitHub. As I know, it's not fixable unfortunately.~ image

  2. UPD: not an issue (https://github.com/microsoft/LightGBM/issues/1143#issuecomment-528691358) ~strange rendering of dimnames function~ image

  3. incomplete list of authors at the main page image

  4. Warning: Topics missing from index: lgb_shared_params, lightgbm. Refer to https://github.com/r-lib/pkgdown/issues/1132

  5. UPD: fixed in #2370 ~wrong version type: our master branch is not "Released version"~ image

  6. UPD: it's actually an issue about exposing callbacks (https://github.com/microsoft/LightGBM/pull/2443#issuecomment-536315297) ~callback.R is not documented~

StrikerRUS commented 5 years ago

According to the policy from #2302, I'm closing this issue as I'm not working on it anymore.

StrikerRUS commented 5 years ago

@jameslamb Can you please take a look at known issues 2 and 4? I have an intuition that they are pretty easy to fix, but unfortunately my lack of knowledge of R doesn't allow me to do it.

jameslamb commented 5 years ago

On number 4...I will take a look. both lightgbm and lgb_shared_params are things I introduced to consolidate some duplicated documentation, but maybe roxygen2 isn't playing nicely with them. I'll try to take a closer look.

For number 2, what do you think is wrong with it? That looks like a faithful representation of the code. It just looks weird because R thinks those are S3 methods, not plain-old functions. That is the danger of choosing function names with dots in them (a practice that is generally discouraged for R code) but now that we have them, I don't think there is much we can do.

StrikerRUS commented 5 years ago

@jameslamb

For number 2, what do you think is wrong with it?

I worried about duplicated entry for one method (its' appearance differs from all other methods) and ` signs around the name. In short, is it OK? πŸ˜ƒ

StrikerRUS commented 5 years ago

Just noticed that callback.R is not documented at all. Adding this to known issues.

jameslamb commented 5 years ago

@jameslamb

For number 2, what do you think is wrong with it?

I worried about duplicated entry for one method (its' appearance differs from all other methods) and ` signs around the name. In short, is it OK? πŸ˜ƒ

yes I think it is ok! The backticks are actually valid R code in that case (they are the way that you create symbols in R which use characters like < that have other meaning)

BruceZhaoR commented 5 years ago

@StrikerRUS

  1. incomplete list of authors at the main page image

For number 3, the are two ways to fix it:

1. Modify the DESCRIPTION, Auhtors@R part, add aut/cre/fnd to role. Because these 3 roles are the main auhtors, and will display in the sitebar part.

Authors@R: c(
    person("Guolin", "Ke", email = "guolin.ke@microsoft.com", role = c("aut", "cre")),
    person("Damien", "Soukhavong", email = "damien.soukhavong@skema.edu", role = c("aut", "ctb")),
    person("Yachen", "Yan", role = c("aut", "ctb")),
    person("James", "Lamb", email="james.lamb@uptake.com", role = c("aut","ctb"))
    )

?person
# `aut`: (Author) Use for full authors who have made substantial contributions to the package and should show up in the package citation.
# `cre`: (Creator) Use for the package maintainer.
# `com`: (Compiler) Use for persons who collected code (potentially in other languages) but did not make further substantial contributions to the package.
# `fnd`: (Funder) Use for persons or organizations that furnished financial support for the development of the package.
# `ctb`: (Contributor) Use for authors who have made smaller contributions (such as code patches etc.) but should not show up in the package citation.

due to the pkgdown code:
https://github.com/r-lib/pkgdown/blob/b6eec62574f1a0087c70eeb6574d6af0435828ff/R/build-home-authors.R#L9-L11 https://github.com/r-lib/pkgdown/blob/b6eec62574f1a0087c70eeb6574d6af0435828ff/R/build-home-authors.R#L60-L70

related issue: https://github.com/r-lib/pkgdown/issues/450

2. Add sidebar html code in the _pkgdown.yml file. This way did not need to add roles to the persons and can custom the sidebar content.

home:
  sidebar: "<div class='links'> <h2>Links</h2> <ul class='list-unstyled'> <li>Browse source code at <br /><a href='https://github.com/Microsoft/LightGBM'>https://&#8203;github.com/&#8203;Microsoft/&#8203;LightGBM</a> </li> <li>Report a bug at <br /><a href='https://github.com/Microsoft/LightGBM/issues'>https://&#8203;github.com/&#8203;Microsoft/&#8203;LightGBM/&#8203;issues</a> </li> </ul> </div> <div class='license'> <h2>License</h2> <ul class='list-unstyled'> <li><a href='https://opensource.org/licenses/mit-license.php'>MIT</a> + file <a href='LICENSE-text.html'>LICENSE</a></li> </ul> </div> <div class='developers'> <h2>Developers</h2> <ul class='list-unstyled'> <li><a href='https://github.com/guolinke'><img src='https://avatars0.githubusercontent.com/u/16040950?s=400&v=4' height='48' /> Guolin Ke</a> <br /> <small class='roles'> Author, maintainer </small> </li> <li><a href='https://github.com/Laurae2'><img src='https://avatars1.githubusercontent.com/u/9083669?s=460&v=4' height='48' /> Damien Soukhavong</a> <br /> <small class='roles'> Contributor </small> </li> <li><a href='https://github.com/yanyachen'><img src='https://avatars1.githubusercontent.com/u/6893682?s=460&v=4' height='48' /> Yachen Yan</a> <br /> <small class='roles'> Contributor </small> </li> <li><a href='https://github.com/jameslamb'><img src='https://avatars1.githubusercontent.com/u/7608904?s=400&v=4' height='48' /> James Lamb</a> <br /> <small class='roles'> Contributor </small> </li> <li><a href='authors.html'>All authors...</a></li> </ul> </div>"

due to the pkgdown code:
https://github.com/r-lib/pkgdown/blob/b6eec62574f1a0087c70eeb6574d6af0435828ff/R/build-home-index.R#L56-L69

StrikerRUS commented 5 years ago

@BruceZhaoR Thanks a lot for your investigation! Ah, they have role filter without the possibility to overwrite it, nice! https://pkgdown.r-lib.org/reference/build_home.html#yaml-config-authors

I saw someone tried to specify roles in _pkgdown.yml, but seems it's pointless. https://github.com/DeclareDesign/fabricatr/blob/cf0dfe71e9ff064c09a03330f5f805b960b18839/_pkgdown.yml#L13-L17

For number 3, the are two ways to fix it:

I'm leaving this for R maintainers decision.

jameslamb commented 4 years ago

I think that the current state of our pkgdown stuff, including new changes from #3072 , is sufficient to keep this issue closed. I've marked this :ballot_box_with_check: in #2302 as I don't think any additional work is required

pkgdown site: https://lightgbm.readthedocs.io/en/latest/R/index.html

For more targeted recommended fixes, like #3036 , please create new issues.

StrikerRUS commented 4 years ago

@jameslamb Please comment on this: https://github.com/microsoft/LightGBM/issues/1143#issuecomment-493770556.

I thought this issue should be closed after that we can simplify our build_r_site.R script up to single call of pkgdown::build_site(). We can't (?) do this right now due to remaining TODOs from the comment above.

jameslamb commented 4 years ago

Ah ok, apologies for missing that commentt.

I'm confused by that comment @StrikerRUS ,can you clarify?

I just ran this and it produced a site that looks identical to our current site:

Rscript build_r.R
cd lightgbm_r
Rscript -e "pkgdown::build_site(install = FALSE)"

If I understand correctly, https://github.com/microsoft/LightGBM/issues/1143#issuecomment-493770556 is actually three areas for improvement that should be documented in their own issues

If you agree with my comment above, I'll break these into individual issues. That will be a lot more manageable for contributors than parsing the comments in this issue, I think.

StrikerRUS commented 4 years ago

@jameslamb Thanks a lot for your explanation! Actually, I thought that not having those sections in R-package will cause a failure in site building process. In other words, they are required. Now I understand that they are optional.

I'm absolutely OK with breaking this meta-issue into individual ones! But given that you've said, I believe it is enough to have only #1944.

Can we now replace https://github.com/microsoft/LightGBM/blob/5cc38e6aa363e6d04a62bcc5e36ea04ca36e917f/build_r_site.R#L10-L20 with build_site() as it doesn't crash without some sections? And after resolving #1944 we'll get articles section at RTD automatically.

jameslamb commented 4 years ago

I think so, yes! Once we merge #3086 , can you leave the fix/remove-devtools branch enabled on readthedocs?

I'll try pushing a change that replaces build_r_site.R with Rscript -e "pkgdown::build_site(install = FALSE)".

Thanks for all your work getting the R documentation set up!

StrikerRUS commented 4 years ago

Sure! Thank you!

jameslamb commented 4 years ago

^ #3098 removes build_r_site.R