Memory Consumption of GradientBooster Remains Constant Despite Increased Iterations - Comparison to XGBoost and LightGBM

DmitrySorda commented 7 months ago

Hi there!

I'm using the forust library and have noticed a curious behavior regarding memory consumption. Unlike popular libraries like XGBoost and LightGBM, where memory usage increases significantly with a higher number of trees (controlled by num_iterations or n_estimators), the memory footprint of GradientBooster seems to remain constant.

For example, setting .set_iterations(20); or .set_iterations(10000); results in the same memory usage (around 2.9 GB) on my dataset.

Here's how I'm setting up the model:

let mut model = GradientBooster::default()
            .set_learning_rate(0.001)
            .set_parallel(true)
            .set_iterations(20);
    model.fit_unweighted(&matrix, &y, None)?;

Could you shed some light on why this is happening? Is there a specific mechanism within the library that manages memory differently compared to XGBoost and LightGBM?

I'm interested in understanding the underlying reasons for this behavior and any potential implications it might have on performance or scalability.

jinlow commented 7 months ago

Hi, I will need to look at this a little more. Do you know for this dataset, what the minimum to maximum memory usage is for either light gbm or XGBoost? For 20 XGBoost trees is it also 2.9 GB, also for these other libraries; does it increase and stay high? Or does it go down after training?

My first hunch is how parallelism is happening, but that wouldn’t explain why these libraries increase with the number of iterations continually.

jinlow commented 7 months ago

@DmitrySorda While I would still be curious to know about the stats on this data when compared to xgboost/lightGBM, the more I Think about it, the more what you are seeing with forust makes sense. The data is created once, and then references to that data are passed around during training, the actual trees themselves are rather small in memory, so it's not really surprising to me that you wouldn't see RAM changing much with a large model verses a small model, because most of the RAM usage is largely just the initial data being passed to the model.

jinlow commented 7 months ago

Hi @DmitrySorda any more thoughts, or additional info you can share, otherwise I’ll likely close this. Thanks

deadsoul44 commented 7 months ago

LightGBM might be keeping grad and hess stats of every boosting round. You can check if the total increase is around 32*2*n_rows*n_rounds.

jinlow / forust

Memory Consumption of GradientBooster Remains Constant Despite Increased Iterations - Comparison to XGBoost and LightGBM #103