neurodata / scikit-learn

scikit-learn-tree fork: A fork that enables extensions of Python and Cython API for decision trees
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

Scarliles/defuse partitioner #70

Open SamuelCarliles3 opened 2 months ago

SamuelCarliles3 commented 2 months ago

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Defuses Partitioner to prevent viral spread of concrete implementations for each Partitioner subtype in classes which hold a concrete instance

Any other comments?

asv benchmarks run fine in my linux dev vm, fail on setup_cache in my m2 macbook...

github-actions[bot] commented 2 months ago

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.

``` examples/linear_model/plot_tweedie_regression_insurance_claims.py:82:35: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 81 | # unquote string fields 82 | for column_name in df.columns[df.dtypes.values == object]: | ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 83 | df[column_name] = df[column_name].str.strip("'") 84 | return df.iloc[:n_samples] | sklearn/cluster/_optics.py:327:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 325 | """ 326 | dtype = bool if self.metric in PAIRWISE_BOOLEAN_FUNCTIONS else float 327 | if dtype == bool and X.dtype != bool: | ^^^^^^^^^^^^^ E721 328 | msg = ( 329 | "Data will be converted to boolean for" | sklearn/cluster/tests/test_dbscan.py:294:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 292 | obj = DBSCAN() 293 | s = pickle.dumps(obj) 294 | assert type(pickle.loads(s)) == obj.__class__ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 | sklearn/linear_model/tests/test_ridge.py:1023:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1022 | assert len(ridge_cv.coef_.shape) == 1 1023 | assert type(ridge_cv.intercept_) == np.float64 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 1024 | 1025 | cv = KFold(5) | sklearn/linear_model/tests/test_ridge.py:1031:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1030 | assert len(ridge_cv.coef_.shape) == 1 1031 | assert type(ridge_cv.intercept_) == np.float64 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 | sklearn/metrics/pairwise.py:2364:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 2362 | dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else "infer_float" 2363 | 2364 | if dtype == bool and (X.dtype != bool or (Y is not None and Y.dtype != bool)): | ^^^^^^^^^^^^^ E721 2365 | msg = "Data was converted to boolean for metric %s" % metric 2366 | warnings.warn(msg, DataConversionWarning) | sklearn/model_selection/_search.py:1100:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1098 | arr_dtype = np.dtype(object) 1099 | else: 1100 | if any(np.min_scalar_type(x) == object for x in param_list): | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 1101 | # `np.result_type` might get thrown off by `.dtype` properties 1102 | # (which some estimators have). | sklearn/model_selection/_search.py:1107:52: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1105 | # https://github.com/scikit-learn/scikit-learn/issues/29157 1106 | arr_dtype = np.dtype(object) 1107 | if len(param_list) == n_candidates and arr_dtype != object: | ^^^^^^^^^^^^^^^^^^^ E721 1108 | # Exclude `object` else the numpy constructor might infer a list of 1109 | # tuples to be a 2d array. | sklearn/model_selection/_split.py:2899:27: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 2897 | if value is None and hasattr(self, "cvargs"): 2898 | value = self.cvargs.get(key, None) 2899 | if len(w) and w[0].category == FutureWarning: | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 2900 | # if the parameter is deprecated, don't show it 2901 | continue | sklearn/model_selection/tests/test_validation.py:589:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 588 | # Make sure all the arrays are of np.ndarray type 589 | assert type(cv_results["test_r2"]) == np.ndarray | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 590 | assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray 591 | assert type(cv_results["fit_time"]) == np.ndarray | sklearn/model_selection/tests/test_validation.py:590:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 588 | # Make sure all the arrays are of np.ndarray type 589 | assert type(cv_results["test_r2"]) == np.ndarray 590 | assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 591 | assert type(cv_results["fit_time"]) == np.ndarray 592 | assert type(cv_results["score_time"]) == np.ndarray | sklearn/model_selection/tests/test_validation.py:591:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 589 | assert type(cv_results["test_r2"]) == np.ndarray 590 | assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray 591 | assert type(cv_results["fit_time"]) == np.ndarray | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 592 | assert type(cv_results["score_time"]) == np.ndarray | sklearn/model_selection/tests/test_validation.py:592:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 590 | assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray 591 | assert type(cv_results["fit_time"]) == np.ndarray 592 | assert type(cv_results["score_time"]) == np.ndarray | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 593 | 594 | # Ensure all the times are within sane limits | sklearn/utils/estimator_checks.py:1509:8: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1508 | # func can output tuple (e.g. score_samples) 1509 | if type(result_full) == tuple: | ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 1510 | result_full = result_full[0] 1511 | result_by_batch = list(map(lambda x: x[0], result_by_batch)) | sklearn/utils/tests/test_validation.py:1343:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 1341 | ) 1342 | assert str(raised_error.value) == str(err_msg) 1343 | assert type(raised_error.value) == type(err_msg) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721 | sklearn/utils/validation.py:874:49: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks | 872 | if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig): 873 | dtype_orig = np.result_type(*dtypes_orig) 874 | elif pandas_requires_conversion and any(d == object for d in dtypes_orig): | ^^^^^^^^^^^ E721 875 | # Force object if any of the dtypes is an object 876 | dtype_orig = object | Found 16 errors. ```

cython-lint

cython-lint detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed cython-lint version is cython-lint=0.16.2.

``` /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:13:1: E265 block comment should start with '# ' /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:71:90: W291 trailing whitespace /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:9:40: 'swap' imported but unused /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:16:1: W293 blank line contains whitespace /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:36:1: W293 blank line contains whitespace /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:539:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:540:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:541:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:542:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:543:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:544:37: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:550:41: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:551:41: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:552:41: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:553:41: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:554:41: E127 continuation line over-indented for visual indent /home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_sort.pxd:13:5: E128 continuation line under-indented for visual indent ```

Generated for commit: 09a8ec5. Link to the linter CI: here