Open fkiraly opened 1 year ago
Thanks @fkiraly,
Re problem 1, I was actually able to pickle MrSQMClassifier, however, only if SFA is not used. For example, this works for me.
clf = MrSQMClassifier(nsax=1,nsfa=0).fit(X_train, y_train)
pickle.dumps(clf)
But this provoked the same TypeError
clf = MrSQMClassifier(nsax=0,nsfa=1).fit(X_train, y_train)
pickle.dumps(clf)
There is not much difference between the SAX and SFA Cython wrappers code so I suspect the problem is in the C++ code. A quick and easy solution would be replacing the SFA C++ implementation with the new Python implementation (https://github.com/patrickzib/dictionary). This will slow down MrSQM significantly though.
Thanks for identifying the root cause.
I patched over the two issues in https://github.com/sktime/sktime/pull/4337 now, as a consequence you cannot retrieve fitted parameters or use nsfa
greater than 0 with the full contract.
It can be released that way and is scheduled for 0.17.0.
We should leave this open until this is fixed in this package, and the two patches are removed in sktime
.
Patch revert needs to be applied in three places in sktime
(the MrSQM
classifier):
_get_fitted_params
overrideget_test_params
need to include a test case with nsfa
strictly greater 0nsfa
and persistence needs to be removed in docstringMay I hand to you the reponsibility to maintain the sktime
interface from 0.17.0 on?
(e.g., once you fix the interface non-compliance in mrsqm
and have that released, revert the patches too)
A quick and easy solution would be replacing the SFA C++ implementation with the new Python implementation (https://github.com/patrickzib/dictionary). This will slow down MrSQM significantly though.
I think speed is still sth you'd want to maintain.
I think you just need to produce a suitable __reduce__
method, that might just be a few lines.
Not sure how to do that in C code though.
may I kindly ask if there are plans (if yes timeline) on resolving this? Need any help?
?
Hi @fkiraly ,
I just revisited this issue and found a solution. The problem is that MrSQMClassifier stores Cython objects (PySFA) that contain pointers. I made several attempts with the reduce function as well as getstate and setstate but couldn't make it work. The solution I found is that to only store the necessary information in a Python array and initiate the objects on the fly. This should work with pickle now. I also updated the Pypi package. On a different note, I also have put MrSEQL on Pypi (https://pypi.org/project/mrseql/).
Great!
Apologies for the late reply.
The solution I found is that to only store the necessary information in a Python array and initiate the objects on the fly. This should work with pickle now. I
Ok, that should work - I'll try to remove the patch, lower bound the required mrsqm
version, and see what the tests say.
Reference issue: https://github.com/sktime/sktime/issues/4396 Downstream PR that enables tests: https://github.com/sktime/sktime/pull/5171
On a different note, I also have put MrSEQL on Pypi (https://pypi.org/project/mrseql/).
Excellent!
The sktime
issue is here: https://github.com/sktime/sktime/issues/4296
How do we proceed with this? I think there are still users who'd like this back.
It should be easy to whip up an interface to make it searchable in sktime
, just copy-pasting mrsqm
. Would you like to do that, or would you prefer someone else does it?
Update: naive replication of MrSQM
recipe here https://github.com/sktime/sktime/pull/5178
@lnthach, quick question: have you addressed both problem 1 (pickle) and 2 (parameters), or only problem 1? By intention, I mean.
Looks like the pickling bug is fixed!
This unlocks further tests, which seem to indicate there is an issue with the random_state
contract, I've opened a new issue: https://github.com/mlgig/mrsqm/issues/11
@lnthach, quick question: have you addressed both problem 1 (pickle) and 2 (parameters), or only problem 1? By intention, I mean
I only addressed problem 1. Re problem 2 sorry if it's a silly question, what need to be done to fix this problem ?
Re problem 2 sorry if it's a silly question, what need to be done to fix this problem ?
Your estimator has to implement a method get_fitted_params
which returns a str
keyed dict
of parameters once the estimator is fitted. The return can be the empty dict
if there are no fitted parameters to show to the user.
The "quickest" fix would be to add this method to the class:
def get_fitted_params(self):
return {}
though of course you may like to return fitted parameters to show to the user if there are any of interest.
@lnthach, @heerme, I've tried integrating
MrSQM
to distribute it withsktime
(by interface to this package), also as a proof-of-concept of how we could back-integrateMrSEQL
. Issue forMrSQM
: https://github.com/sktime/sktime/issues/4338 Issue forMrSEQL
: https://github.com/sktime/sktime/issues/4296The PR https://github.com/sktime/sktime/pull/4337 works in that it proves that we can integrate cython based estimators with
sktime
tests and the up-to-date framework, if those estimators are in an external package such asmrsql
(this one).Now, with the tests working, it would appear that
MrSQM
is failing two contract tests:_get_fitted_params
, fitted parameter inspectionI assume the same tests would fail if you run
check_estimator
, i.e.,How do we proceed from here?
Problem 1: serialization
I'm not too familiar with cython, the exception when serializing (saving to binary) is
TypeError: no default __reduce__ due to non-trivial __cinit__
To make it work, I suspect one would have to implement serialization via the higher-level interface (
save
andload
) or via the lower-level interface,__reduce__
. @achieveordie has done that previously for classes of estimators, maybe he has more insight.To provoke the error without the test suite, do
pickle(my_estimator)
on a fitted instance ofMrSQMClassifier
(that should work without complaining for an easy solution via the lower-level interface).Problem 2: fitted parameters
The other issue is
get_fitted_params
, this should be implemented either directly or via inheritance fromBaseClassifier
- that can be solved on thesktime
side though.