Closed RobertArbon closed 7 years ago
The problem is in osprey/execute_worker.py with:
def build_full_params(xparams):
# make sure we get _all_ the parameters, including defaults on the
# estimator class, to save in the database
params = clone(estimator).set_params(**xparams).get_params()
params = dict((k, v) for k, v in iteritems(params)
if not isinstance(v, BaseEstimator) and
(k != 'steps'))
return params
The problem is that the features
parameter of the FeatureSelector
is an ordered dictionary of the featurizer objects. This is not Json serializable it would seem (not an expert on this). So after running the above function on my pipeline object I get:
scaling__copy True
cluster__batch_size 100
tica__lag_time 1
tica__n_components None
cluster__n_init 3
cluster__tol 0.0
tica__shrinkage None
cluster__reassignment_ratio 0.01
msm__n_timescales None
cluster__init k-means++
scaling__with_scaling True
scaling__with_centering True
features__features OrderedDict([('backbone_dihed', DihedralFeaturizer(sincos=True, types=['phi', 'psi'])), ('residues_dihed', DihedralFeaturizer(sincos=True, types=['chi1', 'chi2', 'chi3', 'chi4'])), ('contacts', ContactFeaturizer(contacts='all', ignore_nonprotein=True,
scheme='closest-heavy'))])
cluster__verbose 0
cluster__max_iter 100
msm__sliding_window True
cluster__n_clusters 8
msm__reversible_type mle
tica__kinetic_mapping True
scaling__quantile_range (25.0, 75.0)
features__which_feat ['backbone_dihed', 'residues_dihed', 'contacts']
cluster__compute_labels True
cluster__init_size None
msm__lag_time 80
cluster__max_no_improvement 10
cluster__random_state None
msm__prior_counts 0
msm__verbose False
msm__ergodic_cutoff on
variance_cut__threshold 0.0
Seems that for the purpose of Osprey the which_feat
parameter is all that is needed.
I'm happy to fix and submit a pull request. A rather hacky fix would be:
if not (isinstance(v, BaseEstimator) or isinstance(v, OrderedDict)) and
(k != 'steps'))
Something that tests for whether the parameter is Json serializable might be preferable, aside from trying to dump it (try: json.dumps(v)
) I'm not sure what would be best.
Thanks for the report @RobertArbon! I like the idea of making a test for whether the object is serializable or not. Apparently, the try
/except
method is the best way about it: https://stackoverflow.com/a/42033176
Would you be willing to submit a PR to add this function to utils.py
?
Yeah, sure, I'll get on this tomorrow.
done in #223
Trying to get the FeatureSelector to work with the pipeline in Osprey. It's not playing ball: Config file:
Error message:
Is there anything I'm doing wrong? Any ideas what's going on?
Many thanks
Rob