Open ryanhausen opened 3 weeks ago
Attention: Patch coverage is 89.79592%
with 10 lines
in your changes missing coverage. Please review.
Project coverage is 80.27%. Comparing base (
1d970b0
) to head (d733508
). Report is 3 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
treeple/stats/forest.py | 81.39% | 5 Missing and 3 partials :warning: |
treeple/stats/utils.py | 96.36% | 0 Missing and 2 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@adam2392 I have updated the PR, let me know if I missed anything or if you want to see some more changes.
@adam2392 I added some documentation to address your comments. Let me know what you think.
I will let @sampan501 @SUKI-O @YuxinB @PSSF23 take a look as well.
Reference Issues/PRs
Fixes #310
What does this implement/fix? Explain your changes.
This adds a sparse implementation of
build_coleman_forest
totreeple.stats
. The sparse implementation relies onscipy.stats.sparse
and so only works in the case of binary classification and regression. In my tests usingcProfile
/memray
, the sparse implementation is a little over 50% faster and uses about %7 less memory, see pdf for more information. I am also attaching the profiling data if you'd like to take a look.This PR also includres a change to the bottleneck implementation to for the dense implementation of
build_coleman_forest
. Specifically there are two important changes:importlib.util.find_spec("bottleneck")
. The old check usingsys
seemed to work fine and passed tests, but actually didn't seem work unless bottleneck had been imported before, which when the tests are run is true, but may not be the case in other situationsnanmean_f
andanynan_f
are now defined usingdef
so that the warning message can moved into their respective calls rather than at import.Any other comments?
The scipy.stats sparse implementation should be swapped out for pydata.sparse when that implementation is performant enough, as it is more general.