ogrisel / pygbm

Experimental Gradient Boosting Machines in Python with numba.
MIT License
183 stars 32 forks source link

Allocate ordered_gradients and ordered_hessians only once #22

Closed NicolasHug closed 6 years ago

NicolasHug commented 6 years ago

Opening this issue so that we don't forget there is room for improvement on this matter.

We could allocate ordered_gradients and ordered_hessians once and for all in the SplittingContext object instead of doing the allocation for every node.

Also, we could populate them in parallel.

I tried doing this in 1b073b1fc8d5e620f84859177a1021ae231b7fa6. It passes the tests but it still allocates the arrays for each node because I forgot to remove this line (><).

When I did remove it, and after having patched places where ordered_gradients is used (e.g. changing sum_gradients = ordered_gradients.sum() intosum_gradients = ordered_gradients[:n_samples].sum()), the test don't pass anymore. It could be a numba bug similar to https://github.com/numba/numba/issues/3459, but I'm not certain I didn't miss something myself.

NicolasHug commented 6 years ago

but I'm not certain I didn't miss something myself.

And I did ^^

Fixed in #33