Open rkwojdan opened 4 years ago
Hi,
A single model is trained together across all the machines. Each machine contains a partition of the data. To add a node to the tree during training, each machine builds a histogram over its data, and then sends the histogram to the aggregator. The aggregator then "aggregates" the histograms together, and sends the result back to each machine.
This is basically a variant of how distributed training is done in XGBoost. If you would like to understand the algorithm better, section 3.2 of the original XGBoost paper might help: https://arxiv.org/pdf/1603.02754.pdf
Hi,
currently I am mostly interested in models aggregation part of Federated Learning. However, I cannot understand how it is done now. I guess it is used with rabit but cannot find in the code any
allreduce
function or something and how the global model is upadted. As of now I have a feeling it works likes this:1) XgBoost model 1 is trained on local data 1 2) XgBoost model 1 is input to model XgBoost model 2 which is trained on local data 2 3) Ends when all local data and temporary XgBoost models are used
It resembles online learning scheme.
Could you help me understand how the aggregation of XgBoost models works here?