mlgig / mrsqm

GNU General Public License v3.0
28 stars 8 forks source link

`MrSQMClassifier`'s `random_state` parameter seem ineffective for `predict_proba` if `nsfa>0` #11

Closed fkiraly closed 9 months ago

fkiraly commented 1 year ago

It seems like the pickling bug is indeed fixed!

This now allows to run save/load tests, which indicate that at least predict_proba is not frozen from setting random_state as it should, in the nsfa>0 case (not in the nsfa=0 case where the same tests pass).'

Theoretically, it could also be an issue with the pickling, not random_state - then I would guess the most likely reason to be a loss of numerical precision occurring when you serialize or deserialize.

Error message below, full error log is in https://github.com/sktime/sktime/actions/runs/6000661723/job/16273328621?pr=5171

=========================== short test summary info ============================
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_save_estimators_to_file[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between saved and loaded estimator MrSQM
Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00205801
Max relative difference: 0.13931118
 x: array([[0.992248, 0.007752],
       [0.017839, 0.982161],
       [0.023511, 0.976489],...
 y: array([[0.992762, 0.007238],
       [0.018737, 0.981263],
       [0.024453, 0.975547],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_fit_idempotent[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals

Mismatched elements: 10 / 10 (100%)
Max absolute difference: 0.00414175
Max relative difference: 0.19419357
 x: array([[0.992142, 0.007858],
       [0.017063, 0.982937],
       [0.02547 , 0.97453 ],...
 y: array([[0.993322, 0.006678],
       [0.01615 , 0.98385 ],
       [0.021328, 0.978672],...
FAILED sktime/tests/test_all_estimators.py::TestAllEstimators::test_persistence_via_pickle[MrSQM-1-ClassifierFitPredict-predict_proba] - AssertionError: 
Arrays are not almost equal to 6 decimals
Results of predict_proba differ between when pickling and not pickling, estimator MrSQM
Mismatched elements: 6 / 10 (60%)
Max absolute difference: 0.00174577
Max relative difference: 0.11265353
 x: array([[0.992649, 0.007351],
       [0.013751, 0.986249],
       [0.019901, 0.980099],...
 y: array([[0.992649, 0.007351],
       [0.015497, 0.984503],
       [0.019901, 0.980099],...
====== 3 failed, 69 passed, 157 skipped, 4 warnings in 102.22s (0:01:42) =======
lnthach commented 11 months ago

Hi @fkiraly thanks for pointing this out. This should be fixed now in the new version (0.0.5). It is due to a bug in setting SFA hyperparameters and python set behaves weirdly.

fkiraly commented 9 months ago

tests pass now, this is resolved