Open dbehrouz opened 6 years ago
Thanks for reporting this.
The goal of the flow / setup objects is to give a serialization of a Machine Learning object (from weka, scikit-learn, mlr) on OpenML.
Using the following code (not checked, might contain small typo's) you can deserialize it:
import openml
flow = openml.flows.get_flow(6969)
pipeline = openml.flows.flow_to_sklearn(flow)
print(pipeline.get_params()['steps'])
If the order of components is also wrong there we have a serious problem.
@mfeurer what do you think about this?
@janvanrijn I just checked and the order after transforming the flow to sklearn is fine. It seems only the flow object may contain the wrong order. This is not a critical issue, but I just thought I should mention it since I was using the 'flow.components' to print the components of the flow.
Thanks for checking this.
@mfeurer what do you think, should we do something about this?
Thanks for reporting this issue. I won't have time this week, but I will work on the OpenML connector next week, so I will also look into the issues you raised @d-behi.
Actually I think this one is due to the way the flow is stored on the server
Checking the following piece of code, the connector indeed seems to send data correctly to the server:
import openml
import sklearn.pipeline
import sklearn.preprocessing
import sklearn.svm
flow = openml.flows.sklearn_to_flow(sklearn.pipeline.Pipeline(
[
('imputation', sklearn.preprocessing.Imputer()),
('onehot', sklearn.preprocessing.OneHotEncoder(handle_unknown='ignore',
categorical_features=None)),
('variancethreshold', sklearn.feature_selection.VarianceThreshold()),
('classifier', sklearn.svm.SVC())
]
))
print(flow._to_xml())
print(' ===> '.join([c for c in flow.components.keys()]))
and the de-serialization despite the component order being wrong. Is it possible to change the way the flow is stored on the server?
Yes and no. Yes, we could potentially change this.
No, I don't think that the order in which the flows are represented in the XML bears meaning, and that we should add that.
Description
Some of the sklearn pipeline flows have the wrong component order. For example flow ids: {6969, 8330, 7694} where, in the 'component OrderedDict` the mode is the first component.
Steps/Code to Reproduce
Example:
Expected Results
u'imputation ===> hotencoding ===> variencethreshold ===> classifier '
Actual Results
u'classifier ===> imputation ===> hotencoding ===> variencethreshold'
Versions
Darwin-17.5.0-x86_64-i386-64bit ('Python', '2.7.15 (default, May 28 2018, 14:20:45) \n[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]') ('NumPy', '1.13.3') ('SciPy', '1.0.0') ('Scikit-Learn', '0.18.1') ('OpenML', '0.7.0')