chore(deps): update xgboost requirement from ~=1.4.2 to ~=1.5.0

Updates the requirements on xgboost to permit the latest version.

Release notes

Release 1.5.0 stable

This release comes with many exciting new features and optimizations, along with some bug fixes. We will describe the experimental categorical data support and the external memory interface independently. Package-specific new features will be listed in respective sections.

Development on categorical data support

In version 1.3, XGBoost introduced an experimental feature for handling categorical data natively, without one-hot encoding. XGBoost can fit categorical splits in decision trees. (Currently, the generated splits will be of form x \in {v}, where the input is compared to a single category value. A future version of XGBoost will generate splits that compare the input against a list of multiple category values.)

Most of the other features, including prediction, SHAP value computation, feature importance, and model plotting were revised to natively handle categorical splits. Also, all Python interfaces including native interface with and without quantized DMatrix, scikit-learn interface, and Dask interface now accept categorical data with a wide range of data structures support including numpy/cupy array and cuDF/pandas/modin dataframe. In practice, the following are required for enabling categorical data support during training:

Use Python package.

Use gpu_hist to train the model.

Use JSON model file format for saving the model.

Once the model is trained, it can be used with most of the features that are available on the Python package. For a quick introduction, see https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html

Related PRs: (#7011, #7001, #7042, #7041, #7047, #7043, #7036, #7054, #7053, #7065, #7213, #7228, #7220, #7221, #7231, #7306)

Next steps

Revise the CPU training algorithm to handle categorical data natively and generate categorical splits

Extend the CPU and GPU algorithms to generate categorical splits of form x \in S where the input is compared with multiple category values. split. (#7081)

External memory

This release features a brand-new interface and implementation for external memory (also known as out-of-core training). (#6901, #7064, #7088, #7089, #7087, #7092, #7070, #7216). The new implementation leverages the data iterator interface, which is currently used to create DeviceQuantileDMatrix. For a quick introduction, see https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html#data-iterator . During the development of this new interface, lz4 compression is removed. (#7076). Please note that external memory support is still experimental and not ready for production use yet. All future development will focus on this new interface and users are advised to migrate. (You are using the old interface if you are using a URL suffix to use external memory.)

New features in Python package

... (truncated)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.4.2 (2021.05.13)

This is a patch release for Python package with following fixes:

Handle the latest version of cupy.ndarray in inplace_predict. (#6933)

Ensure output array from predict_leaf is (n_samples, ) when there's only 1 tree. 1.4.0 outputs (n_samples, 1). (#6889)

Fix empty dataset handling with multi-class AUC. (#6947)

Handle object type from pandas in inplace_predict. (#6927)

v1.4.1 (2021.04.20)

This is a bug fix release.

Fix GPU implementation of AUC on some large datasets. (#6866)

v1.4.0 (2021.04.12)

Introduction of pre-built binary package for R, with GPU support

Starting with release 1.4.0, users now have the option of installing {xgboost} without having to build it from the source. This is particularly advantageous for users who want to take advantage of the GPU algorithm (gpu_hist), as previously they'd have to build {xgboost} from the source using CMake and NVCC. Now installing {xgboost} with GPU support is as easy as: R CMD INSTALL ./xgboost_r_gpu_linux.tar.gz. (#6827)

See the instructions at https://xgboost.readthedocs.io/en/latest/build.html

Improvements on prediction functions

XGBoost has many prediction types including shap value computation and inplace prediction. In 1.4 we overhauled the underlying prediction functions for C API and Python API with an unified interface. (#6777, #6693, #6653, #6662, #6648, #6668, #6804)

Starting with 1.4, sklearn interface prediction will use inplace predict by default when input data is supported.

Users can use inplace predict with dart booster and enable GPU acceleration just like gbtree.

Also all prediction functions with tree models are now thread-safe. Inplace predict is improved with base_margin support.

A new set of C predict functions are exposed in the public interface.

A user-visible change is a newly added parameter called strict_shape. See https://xgboost.readthedocs.io/en/latest/prediction.html for more details.

Improvement on Dask interface

Starting with 1.4, the Dask interface is considered to be feature-complete, which means all of the models found in the single node Python interface are now supported in Dask, including but not limited to ranking and random forest. Also, the prediction function is significantly faster and supports shap value computation.

... (truncated)

Commits

584b45a Release 1.5.0. (#7317)
30c1b5c [backport] Fix prediction with cat data in sklearn interface. (#7306) (#7312)
36e247a Fix weighted samples in multi-class AUC. (#7300) (#7305)
c4aff73 [backport] Fix cv verbose_eval (#7291) (#7296)
cdbfd21 [backport] Fix gamma neg log likelihood. (#7275) (#7285)
508a0b0 [backport] [R] Fix document for nthread. (#7263) (#7269)
e04e773 Add RC1 tag for building packages. (#7261)
1debabb Change version to 1.5.0. (#7258)
d8a549e Avoid thread block with sparse data. (#7255)
ca17f8a Dispatch thrust versions and upgrade rmm. (#7254)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

kjappelbaum / oximachinerunner

chore(deps): update xgboost requirement from ~=1.4.2 to ~=1.5.0 #68

Release 1.5.0 stable

Development on categorical data support

External memory

New features in Python package

XGBoost Change Log

v1.4.2 (2021.05.13)

v1.4.1 (2021.04.20)

v1.4.0 (2021.04.12)

Introduction of pre-built binary package for R, with GPU support

Improvements on prediction functions

Improvement on Dask interface