chore(deps): bump xgboost from 1.1.1 to 1.2.0

Bumps xgboost from 1.1.1 to 1.2.0.

Release notes

Release 1.2.0 stable

XGBoost4J-Spark now supports the GPU algorithm (#5171)

Now XGBoost4J-Spark is able to leverage NVIDIA GPU hardware to speed up training.

There is an on-going work for accelerating the rest of the data pipeline with NVIDIA GPUs (#5950, #5972).

XGBoost now supports CUDA 11 (#5808)

It is now possible to build XGBoost with CUDA 11. Note that we do not yet distributed pre-built binaries built with CUDA 11; all current distributions use CUDA 10.0.

Better guidance for persisting XGBoost models in an R environment (#5940, #5964)

Users are strongly encouraged to use xgb.save() and xgb.save.raw() instead of saveRDS(). This is so that the persisted models can be accessed with future releases of XGBoost.

The previous release (1.1.0) had problems loading models that were saved with saveRDS(). This release adds a compatibility layer to restore access to the old RDS files. Note that this is meant to be a temporary measure; users are advised to stop using saveRDS() and migrate to xgb.save() and xgb.save.raw().

New objectives and metrics

The pseudo-Huber loss reg:pseudohubererror is added (#5647). The corresponding metric is mphe. Right now, the slope is hard-coded to 1.

The Accelerated Failure Time objective for survival analysis (survival:aft) is now accelerated on GPUs (#5714, #5716). The survival metrics aft-nloglik and interval-regression-accuracy are also accelerated on GPUs.

Improved integration with scikit-learn

Added n_features_in_ attribute to the scikit-learn interface to store the number of features used (#5780). This is useful for integrating with some scikit-learn features such as StackingClassifier. See this link for more details.

XGBoostError now inherits ValueError, which conforms scikit-learn's exception requirement (#5696).

Improved integration with Dask

The XGBoost Dask API now exposes an asynchronous interface (#5862). See the document for details.

Zero-copy ingestion of GPU arrays via DaskDeviceQuantileDMatrix (#5623, #5799, #5800, #5803, #5837, #5874, #5901): Previously, the Dask interface had to make 2 data copies: one for concatenating the Dask partition/block into a single block and another for internal representation. To save memory, we introduce DaskDeviceQuantileDMatrix. As long as Dask partitions are resident in the GPU memory, DaskDeviceQuantileDMatrix is able to ingest them directly without making copies. This matrix type wraps DeviceQuantileDMatrix.

The prediction function now returns GPU Series type if the input is from Dask-cuDF (#5710). This is to preserve the input data type.

Robust handling of external data types (#5689, #5893)

As we support more and more external data types, the handling logic has proliferated all over the code base and became hard to keep track. It also became unclear how missing values and threads are handled. We refactored the Python package code to collect all data handling logic to a central location, and now we have an explicit list of of all supported data types.

Improvements in GPU-side data matrix (DeviceQuantileDMatrix)

The GPU-side data matrix now implements its own quantile sketching logic, so that data don't have to be transported back to the main memory (#5700, #5747, #5760, #5846, #5870, #5898). The GK sketching algorithm is also now better documented.

Now we can load extremely sparse dataset like URL, although performance is still sub-optimal.

The GPU-side data matrix now exposes an iterative interface (#5783), so that users are able to construct a matrix from a data iterator. See the Python demo.

New language binding: Swift (#5728)

Visit https://github.com/kongzii/SwiftXGBoost for more details.

Robust model serialization with JSON (#5772, #5804, #5831, #5857, #5934)

We continue efforts from the 1.0.0 release to adopt JSON as the format to save and load models robustly.

JSON model IO is significantly faster and produces smaller model files.

Round-trip reproducibility is guaranteed, via the introduction of an efficient float-to-string conversion algorithm known as the Ryū algorithm. The conversion is locale-independent, producing consistent numeric representation regardless of the locale setting of the user's machine.

We fixed an issue in loading large JSON files to memory.

It is now possible to load a JSON file from a remote source such as S3.

Performance improvements

CPU hist tree method optimization

Skip missing lookup in hist row partitioning if data is dense. (#5644)

Specialize training procedures for CPU hist tree method on distributed environment. (#5557)

Add single point histogram for CPU hist. Previously gradient histogram for CPU hist is hard coded to be 64 bit, now users can specify the parameter single_precision_histogram to use 32 bit histogram instead for faster training performance. (#5624, #5811)

GPU hist tree method optimization

Removed some unnecessary synchronizations and better memory allocation pattern. (#5707)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.1.0 (2020.05.17)

Better performance on multi-core CPUs (#5244, #5334, #5522)

Poor performance scaling of the hist algorithm for multi-core CPUs has been under investigation (#3810). #5244 concludes the ongoing effort to improve performance scaling on multi-CPUs, in particular Intel CPUs. Roadmap: #5104

#5334 makes steps toward reducing memory consumption for the hist tree method on CPU.

#5522 optimizes random number generation for data sampling.

Deterministic GPU algorithm for regression and classification (#5361)

GPU algorithm for regression and classification tasks is now deterministic.

Roadmap: #5023. Currently only single-GPU training is deterministic. Distributed training with multiple GPUs is not yet deterministic.

Improve external memory support on GPUs (#5093, #5365)

Starting from 1.0.0 release, we added support for external memory on GPUs to enable training with larger datasets. Gradient-based sampling (#5093) speeds up the external memory algorithm by intelligently sampling a subset of the training data to copy into the GPU memory. Learn more about out-of-core GPU gradient boosting.

GPU-side data sketching now works with data from external memory (#5365).

Parameter validation: detection of unused or incorrect parameters (#5477, #5569, #5508)

Mis-spelled training parameter is a common user mistake. In previous versions of XGBoost, mis-spelled parameters were silently ignored. Starting with 1.0.0 release, XGBoost will produce a warning message if there is any unused training parameters. The 1.1.0 release makes parameter validation available to the scikit-learn interface (#5477) and the R binding (#5569).

Thread-safe, in-place prediction method (#5389, #5512)

Previously, the prediction method was not thread-safe (#5339). This release adds a new API function inplace_predict() that is thread-safe. It is now possible to serve concurrent requests for prediction using a shared model object.

It is now possible to compute prediction in-place for selected data formats (numpy.ndarray / scipy.sparse.csr_matrix / cupy.ndarray / cudf.DataFrame / pd.DataFrame) without creating a DMatrix object.

Addition of Accelerated Failure Time objective for survival analysis (#4763, #5473, #5486, #5552, #5553)

Survival analysis (regression) models the time it takes for an event of interest to occur. The target label is potentially censored, i.e. the label is a range rather than a single number. We added a new objective survival:aft to support survival analysis. Also added is the new API to specify the ranged labels. Check out the tutorial and the demos.

GPU support is work in progress (#5714).

Improved installation experience on Mac OSX (#5597, #5602, #5606, #5701)

It only takes two commands to install the XGBoost Python package: brew install libomp followed by pip install xgboost. The installed XGBoost will use all CPU cores. Even better, starting with this release, we distribute pre-compiled binary wheels targeting Mac OSX. Now the install command pip install xgboost finishes instantly, as it no longer compiles the C++ source of XGBoost. The last three Mac versions (High Sierra, Mojave, Catalina) are supported.

R package: the 1.1.0 release fixes the error Initializing libomp.dylib, but found libomp.dylib already initialized (#5701)

Ranking metrics are now accelerated on GPUs (#5380, #5387, #5398)

GPU-side data matrix to ingest data directly from other GPU libraries (#5420, #5465)

Previously, data on GPU memory had to be copied back to the main memory before it could be used by XGBoost. Starting with 1.1.0 release, XGBoost provides a dedicated interface (DeviceQuantileDMatrix) so that it can ingest data from GPU memory directly. The result is that XGBoost interoperates better with GPU-accelerated data science libraries, such as cuDF, cuPy, and PyTorch.

Set device in device dmatrix. (#5596)

Robust model serialization with JSON (#5123, #5217)

We continue efforts from the 1.0.0 release to adopt JSON as the format to save and load models robustly. Refer to the release note for 1.0.0 to learn more.

It is now possible to store internal configuration of the trained model (Booster) object in R as a JSON string (#5123, #5217).

Improved integration with Dask

Pass through verbose parameter for dask fit (#5413)

Use DMLC_TASK_ID. (#5415)

Order the prediction result. (#5416)

Honor nthreads from dask worker. (#5414)

Commits

7387866 Release 1.2.0
04232c0 [CI] Fix broken tests (#6048)
0353a78 Fix scikit learn cls doc. (#6041)
0089a0e Fix another typo
03a68a1 Fix typo
a0da8a7 Make RC2
eee4eff [CI] Build GPU-enabled JAR artifact and deploy to xgboost-maven-repo
936a854 Back port fixes to 1.2 (#6002)
7856da5 [CI] Use mgpu machine to run gpu hist unit tests
50a0def Make RC1
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Pull request limits (per update run and/or open at any time) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired)

kjappelbaum / oximachinerunner

chore(deps): bump xgboost from 1.1.1 to 1.2.0 #7

Release 1.2.0 stable

XGBoost4J-Spark now supports the GPU algorithm (#5171)

XGBoost now supports CUDA 11 (#5808)

Better guidance for persisting XGBoost models in an R environment (#5940, #5964)

New objectives and metrics

Improved integration with scikit-learn

Improved integration with Dask

Robust handling of external data types (#5689, #5893)

Improvements in GPU-side data matrix (`DeviceQuantileDMatrix`)

New language binding: Swift (#5728)

Robust model serialization with JSON (#5772, #5804, #5831, #5857, #5934)

Performance improvements

XGBoost Change Log

v1.1.0 (2020.05.17)

Better performance on multi-core CPUs (#5244, #5334, #5522)

Deterministic GPU algorithm for regression and classification (#5361)

Improve external memory support on GPUs (#5093, #5365)

Parameter validation: detection of unused or incorrect parameters (#5477, #5569, #5508)

Thread-safe, in-place prediction method (#5389, #5512)

Addition of Accelerated Failure Time objective for survival analysis (#4763, #5473, #5486, #5552, #5553)

Improved installation experience on Mac OSX (#5597, #5602, #5606, #5701)

Ranking metrics are now accelerated on GPUs (#5380, #5387, #5398)

GPU-side data matrix to ingest data directly from other GPU libraries (#5420, #5465)

Robust model serialization with JSON (#5123, #5217)

Improved integration with Dask

kjappelbaum / oximachinerunner

chore(deps): bump xgboost from 1.1.1 to 1.2.0 #7

Release 1.2.0 stable

XGBoost4J-Spark now supports the GPU algorithm (#5171)

XGBoost now supports CUDA 11 (#5808)

Better guidance for persisting XGBoost models in an R environment (#5940, #5964)

New objectives and metrics

Improved integration with scikit-learn

Improved integration with Dask

Robust handling of external data types (#5689, #5893)

Improvements in GPU-side data matrix (DeviceQuantileDMatrix)

New language binding: Swift (#5728)

Robust model serialization with JSON (#5772, #5804, #5831, #5857, #5934)

Performance improvements

XGBoost Change Log

v1.1.0 (2020.05.17)

Better performance on multi-core CPUs (#5244, #5334, #5522)

Deterministic GPU algorithm for regression and classification (#5361)

Improve external memory support on GPUs (#5093, #5365)

Parameter validation: detection of unused or incorrect parameters (#5477, #5569, #5508)

Thread-safe, in-place prediction method (#5389, #5512)

Addition of Accelerated Failure Time objective for survival analysis (#4763, #5473, #5486, #5552, #5553)

Improved installation experience on Mac OSX (#5597, #5602, #5606, #5701)

Ranking metrics are now accelerated on GPUs (#5380, #5387, #5398)

GPU-side data matrix to ingest data directly from other GPU libraries (#5420, #5465)

Robust model serialization with JSON (#5123, #5217)

Improved integration with Dask

Improvements in GPU-side data matrix (`DeviceQuantileDMatrix`)