neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
61 stars 14 forks source link

Bottleneck should not raise a warning unless relevant function in treeple.stats is called #310

Open adam2392 opened 1 month ago

adam2392 commented 1 month ago

These lines seem like they are ran regardless of whether or not treeple.stats is used, since they are in the global namespace.

if BOTTLENECK_AVAILABLE and DISABLE_BN_ENV_VAR not in os.environ:
    nanmean_f = bn.nanmean
    anynan_f = lambda arr: bn.anynan(arr, axis=2)
else:
    warnings.warn(
        "Not using bottleneck for calculations involvings nans. Expect slower performance."
    )
    nanmean_f = np.nanmean
    anynan_f = lambda arr: np.isnan(arr).any(axis=2)

so I get this warning consistently no matter what

/Users/adam2392/Documents/treeple/treeple/stats/utils.py:34: UserWarning: Not using bottleneck for calculations involvings nans. Expect slower performance.
  warnings.warn(

@ryanhausen do you mind submitting a PR to patch this behavior?

ryanhausen commented 1 month ago

@adam2392 sure! This warning should have only shown when treeple.stats is imported? Were you seeing it at other times?

adam2392 commented 1 month ago

Yeah I think because we do imports in __init__.py

Screenshot 2024-08-08 at 9 43 00 AM

I would take a look at the strategy, or designs sklearn uses to soft-import optional dependencies. E.g. polars, or joblib

adam2392 commented 1 month ago

Just to clarify @ryanhausen rather than warning during import, it should only warn the user when the function is used

ryanhausen commented 1 month ago

@adam2392 agreed.