Open beckernick opened 2 years ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Comparing multi-GPU dask.xgboost regressor training with single GPU xgboost training on the same sample dataset, I generally get similar RMSE results if I use the same number of boosting rounds. This doesn't entirely surprise me, as my understanding is each gradient update forces a sync across workers.
However, when comparing multi-GPU random forest regressor training with standard single GPU random forest regressor training on the sample dataset, I generally get significantly superior results with the single GPU estimator when using the same total number of trees and what I believe to be the same configuration (
max_depth
defaults to -1 for cuml.dask.ensemble.RF so I set it as 1000 in the single GPU test) .Based on the distributed RF implementation (embarrassingly parallel tree construction with each worker having a portion of the data locally), is this expected behavior? Does the data handling per worker differ from how XGBoost handles it?
With pseudo-randomly generated data, I wouldn't initially expect data skew/ordering to be significant here.
This result is shown with the following example: