Closed ryanhausen closed 1 month ago
I would categorize bottleneck as an optional dependency within a new category called 'extra'
.
Feel free to add the relevant entry here: https://github.com/neurodata/treeple/blob/main/pyproject.toml
In addition, can you add a CHANGELOG entry?
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 78.66%. Comparing base (
dd28c41
) to head (7244b98
). Report is 4 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Do you know how other packages test optional dependencies? It would be great to add a unit-test that runs the stuff w/ and w/o bottleneck to assert the answer is the same.
EDIT: Sorry I didn't notice your comment above. Ya I wanted to add a test too. Right now bottleneck only touches two things, _non_nan_samples
and _parallel_build_null_forests
. I added a test for _non_nan_samples
, but _parallel_build_null_forests
does quite a bit and doesn't seem to have it's own test. It could be refactored into multiple more modular functions and then they could all be separately tested, but to follow the existing testing structure, I modified one of the other tests to turn off/on bottleneck.
@adam2392 Would you mind taking a look again. I made a couple changes to make it testable, added a test, and changed an existing test. I am failing codecov, but I'm not sure where, the link from codecov isn't showing me the file. I am sure it's me.
It shows this:
Interesting, ok thanks. I'll fix it.
@adam2392 I fixed coverage locally. However, by design, bottleneck isn't included in the build because it's optional. I can add it to the build or test requirements files and that would force it into the CI process. Do you have a preference for which one?
Yes in the CI that tests coverage, we should add the extra group of installs.
@adam2392 I got it added to the build step by overriding pip install in spin. It looks like the mac builds for python 3.9/10 aren't happy about something, but the others are ok. I don't have a mac to test it. Do you know have you seen this behavior before?
build_and_test_slow uses spin for the install too right? Would installing the package via pip not create other issues rather than using spin?
Would you mind rerunning those actions? If it doesn't go away, then maybe if there is an issue with bottleneck, that we need to know about
build_and_test_slow uses spin for the install too right? Would installing the package via pip not create other issues rather than using spin?
No for some unfortunate reasons, we use pip install from a requirements file: https://github.com/neurodata/treeple/blob/1e62fe3a9a3323bc2bfe1f7abbffdf5c32256c50/.github/workflows/main.yml#L180-L186
So I would just add a pip install .[extra]
step there, revert the changes in .spin/cmds.py
, and see if that addresses the coverage issue.
Note ./spin install
will not install anything besides the treeple
package, and its required dependencies, so all extra installations, such as for docs, testing, and style checks are separate.
Now that I think about it, you can also add bottleneck to the test
group, since we always want it for unit-testing. If you go this route, you would also just add to the test_requirements.txt
file.
@adam2392 would you mind taking a look when you have the chance.
@adam2392 sorry I missed your comment I merged with latest from main and updated the version in the docs. Looks like there was a weird error with one of the Mac builds. Doesn't seem related to the code though.
Cool. Thanks for the PR @ryanhausen! Can you take a look at what the differential is when using jovo's suggestion?
I.e. pre-allocate less nans than needed
Yep!
Reference Issues/PRs
None
What does this implement/fix? Explain
Changes nan related calculations in treeple.stats.utils to use bottleneck rather numpy. In my testing, this seems to offer a significant performance speed up, see attached doc: profiling.pdf
I am also attached the cProfile outputs for the benchmarking in the above pdf. profiles.zip
Any other comments?
I am opening as a draft as I am not sure if you want to make bottleneck a required dependency or optional. And if optional, how you want to categorize it.
Looking forward to your feedback!