Closed ogrisel closed 6 years ago
At this time some tests are broken (need to be updated) and the code runs slightly slower than on master for some reason I do not understand yet (on my laptop the speed is the same as on master).
The memory leak issue should be fixed though.
The fact that we observe a slowdown when we update the packed histograms
array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.
The fact that we observe a slowdown when we update the packed histograms array in the parallel for loop while we do not observe this issue in master where the packed histograms array is filled sequentially might be a case of False Sharing.
@NicolasHug noted that false sharing might not be a problem in a write only datastructure updated in a parallel for loop. I don't know.
What I observe though is that on a 12 cores machine, the code runs 2x faster with "tbb"
as the numba.config.THREADING_LAYER
that with "workqueue"
. "'omp'" performance is closer to "tbb"
than "workqueue"
. But even with `"tbb", LightGBM is significantly faster than pygbm on this machine.
Actually I tried again with master and the various threading backends and this PR is either as fast or faster than master. I must have done something wrong when I reported the initial slow down.
In any case, tbb is significantly faster than the workqueue backend when the number of cores is large (e.g. 12 in my case).
@NicolasHug I have to go, feel free to update the failing tests and merge this PR.
Merging #36 into master will increase coverage by
0.02%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #36 +/- ##
==========================================
+ Coverage 94.34% 94.36% +0.02%
==========================================
Files 8 8
Lines 778 781 +3
==========================================
+ Hits 734 737 +3
Misses 44 44
Impacted Files | Coverage Δ | |
---|---|---|
pygbm/splitting.py | 99.47% <100%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 154def0...0960a2f. Read the comment docs.
Here are the same plots from https://github.com/ogrisel/pygbm/issues/31#issuecomment-435626534 now. I don't know how I feel about the 1e7 case.
We have now the following results with the benchmark from: https://github.com/ogrisel/pygbm/issues/30#issue-376370377 (numba is pre-compiled here for fair comparisons):
Laptop with 8GB RAM, i5 7th gen.
Lightgbm: 75.408s, ROC AUC: 0.8293 Pygbm: 83.022s, ROC AUC: 0.8156 No VIRT explosion
:smile:
Code:
Log:
Also, I tried to reproduce the leak with a minimal example and this is what I came up with. It's pretty weird and I'd like to know if you can reproduce it before submitting it to numba because I feel like I'm tripping:
@ogrisel I'm happy to merge as is, but I'd like your input on the 1e7 case first.
Cool, let's merge as it's already a net improvement.
About the minimal reproduction case, I confirm I get the leak with your code, without any error message or exception. Just memory usage increasing as reported by psutil.
We still have a discrepancy in terms of results with LightGBM though. But there is another issue for that.
I have opened https://github.com/numba/numba/issues/3473 and https://github.com/numba/numba/issues/3472 regarding the leak and some other weird stuff I found.
Tentative fix for #31.