microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.35k stars 3.8k forks source link

Feature Requests & Voting Hub #2302

Open guolinke opened 4 years ago

guolinke commented 4 years ago

This issue is to maintain all features request on one page.

Note to contributors: If you want to work for a requested feature, re-open the linked issue. Everyone is welcome to work on any of the issues below.

Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue...these should be left open so they are discoverable by new contributors.

Call for Voting

we would like to call the voting here, to prioritize these requests. If you think a feature request is very necessary for you, you can vote for it by the following process:

  1. got the issue (feature request) number.
  2. search the number in this issue, check the voting of it exists or not.
  3. if the voting exists, you can add 👍 to that voting
  4. if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.

Discussions


Efficiency related


Effectiveness related


Distributed platform and GPU (OpenCL-based and CUDA)


Maintenance

Python package:

R package:


New features

New algorithms:

Objective and metric functions:

Python package:

Dask:

R package:

New language wrappers:

Input enhancements:

onacrame commented 4 years ago

There’s a reference to minimum variance sampling here:

https://catboost.ai/docs/concepts/algorithm-main-stages_bootstrap-options.html

Although I think it just speeds up training rather than providing out of core training.

momijiame commented 4 years ago

I would like to tackle the following issues on Python package. Could I discuss about a plan to fix? Also, where can we discuss that? IMHO, They will be resolved by improving to lightgbm.cv() function.

2105: Make _CVBooster public for better stacking experience

283: Keep cv predicted values

I want to reopen the above issues, but I can not do that. Maybe I have no permission.

StrikerRUS commented 4 years ago

@momijiame Thank you for your interest! I've unlocked those issues for commenting. Please let's continue the discussion there.

guolinke commented 3 years ago

we would like to call the voting here, to prioritize these requests. If you think a feature request is very necessary for you, you can vote for it by the following process:

  1. got the issue (feature request) number.
  2. search the number in this issue, check the voting of it exists or not.
  3. if the voting exists, you can add 👍 to that voting
  4. if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.
StrikerRUS commented 3 years ago

we would like to call the voting here

Let me start.

2644

candalfigomoro commented 3 years ago

It was proposed by me so I'm a little bit biased

Decouple boosting types #3128

candalfigomoro commented 3 years ago

GPU binaries release #2263

candalfigomoro commented 3 years ago

Enhance parameter tuning guide with more params #2617

candalfigomoro commented 3 years ago

Subsampling rows with replacement #1038

candalfigomoro commented 3 years ago

Piece-wise linear tree #1315 (also see PR https://github.com/microsoft/LightGBM/pull/3299)

candalfigomoro commented 3 years ago

Multi-output regression #524

candalfigomoro commented 3 years ago

Cox Proportional Hazard Regression #1837

jameslamb commented 3 years ago

Based on https://github.com/microsoft/LightGBM/issues/2983#issuecomment-722630931, I've updated this issue's description:

Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue...these should be left open so they are discoverable by new contributors.

I think that we should keep good first issue issues open, so it's easy for new contributors to find them.

gzls90 commented 3 years ago

Read from multiple files #2031

gzls90 commented 3 years ago

Parquet file support #1286

gzls90 commented 3 years ago

Register custom objective / loss function #3244

gzls90 commented 3 years ago

Object importance #1460

wenmin-wu commented 3 years ago

read from multiple zipped libsvm format text files

gzls90 commented 3 years ago

Multiple GPU support #620

7starsea commented 3 years ago

Multiple GPU support (#620) (From my experience, the xgboost with gpu seems faster than lightgbm with gpu.)

StrikerRUS commented 3 years ago

For everyone who was voting for multi-gpu support, please try our new experimental CUDA version which was kindly contributed by our friends from IBM. This version supports multi-GPU training. We will really appreciate any early feedback on this experimental feature (please create new issues, do not comment here).

How to install: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version-experimental.

Argument to specify number of GPUs: https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu.

jz457365 commented 2 years ago

Support ignoring some features during training on constructed dataset #4317

jz457365 commented 2 years ago

Spike and slab feature sampling priors (feature weighted sampling) #2542

bethrice44 commented 2 years ago

Quantile LightGBM: ensure monotonic #3447

robo-sq commented 2 years ago

SHAP feature contribution for linear trees #4002

ira-saktor commented 2 years ago

Create dataset from pyarrow tables: #3369

js850 commented 1 year ago

Add support for CRLF line endings or improve documentation and error message #5508

thomaslundgaard commented 1 year ago

Optimisations for Apple Silicon #3606

antaradas94 commented 1 year ago

Add parameter to control maximum group size for Lambdarank  #5053

chopeen commented 1 year ago

Allow training without loading full dataset into memory #5094

chopeen commented 1 year ago

Support different data types (when load data from Python) #3459

szjunma commented 1 year ago

Add support for early stopping in Dask interface #3712

vitorpbarbosa7 commented 1 year ago

Add Earth Mover Distance as objective metric to be optimized (maximized) #1256

tim-habitat commented 1 year ago

Apache Arrow seems to be gaining a lot of traction in the dataframe space. We use polars and it would be great to be able to directly create a dataset from arrow format. Also, pandas 2.0 will have arrow as a backend later this month .

barynton commented 1 year ago

Conan installation support #5770

sanurielf commented 1 year ago

Add support for Multi-output regression #524

vladv14 commented 1 year ago

Provide access to the bin ids and bin upper bounds of the constructed dataset #5191

onacrame commented 1 year ago

Consider implementation of the sketchboost algorithm for multi output/multiclass setting. The current multiclass approach is highly ineffecient as a separate tree structure is required for each class. This approach significantly improves on training time and model size by allowing a single tree structure to handle many classes.

This is already implemented in the Py-Boost library.

https://arxiv.org/pdf/2211.12858.pdf

borchero commented 11 months ago

I am currently working on Apache Arrow support and will likely open a PR next week :)

Update: Implementation in https://github.com/microsoft/LightGBM/pull/6022

kaizhu256 commented 9 months ago

WebAssembly support (https://github.com/microsoft/LightGBM/issues/5372)

jane-delaney commented 7 months ago

Support monotone constraints with quantile objective #3371

bhvieira commented 5 months ago

Recalculate feature importance during the update process of a tree model / Calculate Gain Importance on Test Data (#2413)

cgoo4 commented 4 months ago

Add R-package support for an early-stopping min_delta as implemented in Python #4580 and referenced in #2526.