wfondrie / mokapot

Fast and flexible semi-supervised learning for peptide detection in Python
https://mokapot.readthedocs.io
Apache License 2.0
40 stars 14 forks source link

changed metadata import to allow the groups arg #86

Closed jspaezp closed 1 year ago

jspaezp commented 1 year ago

solves: https://github.com/wfondrie/mokapot/issues/85 Also adds multiple python versions to the test matrix in the github worflows

jspaezp commented 1 year ago

https://github.com/wfondrie/mokapot/blob/f2eea1d209c2ce9739cb52076c7834b9f9f73192/tests/unit_tests/test_brew.py#L25

This test fails ... with error AttributeError: 'RandomForestClassifier' object has no attribute 'best_params_'; mokapot/model.py:642: AttributeError

https://github.com/wfondrie/mokapot/blob/f2eea1d209c2ce9739cb52076c7834b9f9f73192/mokapot/model.py#L633-L647

I think the issue is that ... Mokapot uses the existance of the 'estimator' attribute as an indicator of it being a model that has not undergone hparam optimization ...

https://github.com/wfondrie/mokapot/blob/f2eea1d209c2ce9739cb52076c7834b9f9f73192/mokapot/model.py#L157-L160

nontheless, the RandomForest needs has an estimator attr but does not have a bestparams (attrs: ['estimator', 'n_estimators', 'estimator_params', 'base_estimator', 'bootstrap', 'oob_score', 'n_jobs', 'random_state', 'verbose', 'warm_start', 'class_weight', 'max_samples', 'criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'ccp_alpha', 'n_features_in_', 'n_outputs_', 'classes_', 'n_classes_', '_estimator', 'estimators_'])

https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/ensemble/_forest.py#L1407

Thus making the RF model to fail in the test.... I am not sure what else could be done to check if the model passed actually requires hparam search. isinstance(xxx, BaseSearchCV)

codecov-commenter commented 1 year ago

Codecov Report

Merging #86 (33caa15) into master (f2eea1d) will increase coverage by 0.10%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #86      +/-   ##
==========================================
+ Coverage   84.77%   84.87%   +0.10%     
==========================================
  Files          19       19              
  Lines        1622     1620       -2     
==========================================
  Hits         1375     1375              
+ Misses        247      245       -2     
Impacted Files Coverage ฮ”
mokapot/model.py 87.00% <100.00%> (+0.06%) :arrow_up:
mokapot/plugins/__init__.py 88.88% <100.00%> (+5.55%) :arrow_up:

:mega: Weโ€™re building smart automated test selection to slash your CI/CD build times. Learn more

wfondrie commented 1 year ago

Ah yes - testing for an estimator was a bit of a hack to duck type for a GridSearchCV, RandomizedSearchCV, etc object. I'm actually surprised it took this long to rear its head ๐Ÿ™ƒ

wfondrie commented 1 year ago

Perfect! I'll merge once CI has completed successfully ๐Ÿ™Œ