chore(deps): update xgboost requirement from ~=1.3.0 to ~=1.3.3

Updates the requirements on xgboost to permit the latest version.

Release notes

1.3.3 Patch Release

Fix regression on best_ntree_limit. (#6616)

Changelog

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.3.0 (2020.12.08)

XGBoost4J-Spark: Exceptions should cancel jobs gracefully instead of killing SparkContext (#6019).

By default, exceptions in XGBoost4J-Spark causes the whole SparkContext to shut down, necessitating the restart of the Spark cluster. This behavior is often a major inconvenience.

Starting from 1.3.0 release, XGBoost adds a new parameter killSparkContextOnWorkerFailure to optionally prevent killing SparkContext. If this parameter is set, exceptions will gracefully cancel training jobs instead of killing SparkContext.

GPUTreeSHAP: GPU acceleration of the TreeSHAP algorithm (#6038, #6064, #6087, #6099, #6163, #6281, #6332)

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain predictions of machine learning models. It computes feature importance scores for individual examples, establishing how each feature influences a particular prediction. TreeSHAP is an optimized SHAP algorithm specifically designed for decision tree ensembles.

Starting with 1.3.0 release, it is now possible to leverage CUDA-capable GPUs to accelerate the TreeSHAP algorithm. Check out the demo notebook.

The CUDA implementation of the TreeSHAP algorithm is hosted at rapidsai/GPUTreeSHAP. XGBoost imports it as a Git submodule.

New style Python callback API (#6199, #6270, #6320, #6348, #6376, #6399, #6441)

The XGBoost Python package now offers a re-designed callback API. The new callback API lets you design various extensions of training in idomatic Python. In addition, the new callback API allows you to use early stopping with the native Dask API (xgboost.dask). Check out the tutorial and the demo.

Enable the use of DeviceQuantileDMatrix / DaskDeviceQuantileDMatrix with large data (#6201, #6229, #6234).

DeviceQuantileDMatrix can achieve memory saving by avoiding extra copies of the training data, and the saving is bigger for large data. Unfortunately, large data with more than 2^31 elements was triggering integer overflow bugs in CUB and Thrust. Tracking issue: #6228.

This release contains a series of work-arounds to allow the use of DeviceQuantileDMatrix with large data:

Loop over copy_if (#6201)

Loop over thrust::reduce (#6229)

Implement the inclusive scan algorithm in-house, to handle large offsets (#6234)

Support slicing of tree models (#6302)

Accessing the best iteration of a model after the application of early stopping used to be error-prone, need to manually pass the ntree_limit argument to the predict() function.

Now we provide a simple interface to slice tree models by specifying a range of boosting rounds. The tree ensemble can be split into multiple sub-ensembles via the slicing interface. Check out an example.

In addition, the early stopping callback now supports save_best option. When enabled, XGBoost will save (persist) the model at the best boosting round and discard the trees that were fit subsequent to the best round.

Weighted subsampling of features (columns) (#5962)

It is now possible to sample features (columns) via weighted subsampling, in which features with higher weights are more likely to be selected in the sample. Weighted subsampling allows you to encode domain knowledge by emphasizing a particular set of features in the choice of tree splits. In addition, you can prevent particular features from being used in any splits, by assigning them zero weights.

Check out the demo.

Improved integration with Dask

Support reverse-proxy environment such as Google Kubernetes Engine (#6343, #6475)

An XGBoost training job will no longer use all available workers. Instead, it will only use the workers that contain input data (#6343).

The new callback API works well with the Dask training API.

The predict() and fit() function of DaskXGBClassifier and DaskXGBRegressor now accept a base margin (#6155).

Support more meta data in the Dask API (#6130, #6132, #6333).

Allow passing extra keyword arguments as kwargs in predict() (#6117)

Fix typo in dask interface: sample_weights -> sample_weight (#6240)

Allow empty data matrix in AFT survival, as Dask may produce empty partitions (#6379)

Speed up prediction by overlapping prediction jobs in all workers (#6412)

Experimental support for direct splits with categorical features (#6028, #6128, #6137, #6140, #6164, #6165, #6166, #6179, #6194, #6219)

Currently, XGBoost requires users to one-hot-encode categorical variables. This has adverse performance implications, as the creation of many dummy variables results into higher memory consumption and may require fitting deeper trees to achieve equivalent model accuracy.

The 1.3.0 release of XGBoost contains an experimental support for direct handling of categorical variables in test nodes. Each test node will have the condition of form feature_value \in match_set, where the match_set on the right hand side contains one or more matching categories. The matching categories in match_set represent the condition for traversing to the right child node. Currently, XGBoost will only generate categorical splits with only a single matching category ("one-vs-rest split"). In a future release, we plan to remove this restriction and produce splits with multiple matching categories in match_set.

The categorical split requires the use of JSON model serialization. The legacy binary serialization method cannot be used to save (persist) models with categorical splits.

... (truncated)

Commits

000292c Bump release version to 1.3.3. (#6624)
d3ec116 Revert ntree limit fix (#6616) (#6622)
a018028 Remove type check for solaris. (#6606)
3e34315 Release patch release 1.3.2
99e802f Remove duplicated DMatrix. (#6592) (#6599)
6a29afb Fix evaluation result for XGBRanker. (#6594) (#6600)
8e321ad Support Solaris. (#6578) (#6588)
d0ec655 [backport] Fix best_ntree_limit for dart and gblinear. (#6579) (#6587)
7aec915 [Backport] Rename data to X in predict_proba. (#6555) (#6586)
a78d0d4 Release patch release 1.3.1 (#6543)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

kjappelbaum / oximachinerunner

chore(deps): update xgboost requirement from ~=1.3.0 to ~=1.3.3 #56

1.3.3 Patch Release

XGBoost Change Log

v1.3.0 (2020.12.08)

XGBoost4J-Spark: Exceptions should cancel jobs gracefully instead of killing SparkContext (#6019).

GPUTreeSHAP: GPU acceleration of the TreeSHAP algorithm (#6038, #6064, #6087, #6099, #6163, #6281, #6332)

New style Python callback API (#6199, #6270, #6320, #6348, #6376, #6399, #6441)

Enable the use of `DeviceQuantileDMatrix` / `DaskDeviceQuantileDMatrix` with large data (#6201, #6229, #6234).

Support slicing of tree models (#6302)

Weighted subsampling of features (columns) (#5962)

Improved integration with Dask

Experimental support for direct splits with categorical features (#6028, #6128, #6137, #6140, #6164, #6165, #6166, #6179, #6194, #6219)