msmbuilder / osprey

🦅Hyperparameter optimization for machine learning pipelines 🦅
http://msmbuilder.org/osprey
Apache License 2.0
74 stars 26 forks source link

intialize_trial() can't serialize some estimator parameters #222

Closed RobertArbon closed 7 years ago

RobertArbon commented 7 years ago

in osprey/execute_worker.py:

params = dict((k, v) for k, v in iteritems(params)              
if not isinstance(v, BaseEstimator) and
        #               (k != 'steps'))

There's an error when calling session.commit() I always seem to get something like: raise TypeError(repr(o) + " is not JSON serializable")

I think it's to do with the types which get put in the params dictionary. I've identified these cases which it fails:

  1. when using the msmbuilder.feature_selection.FeatureSelector object - the features parameters is an OrderedDict.
  2. when using jump type variables. The call to np.linspace creates numpy.int64 (say) variables.
  3. (Haven't tested this) atom_indices parameters in, for example AtomPairsFeaturizer, is a numpy.ndarray object.
  4. (Haven't tested this) ref_traj parameter in , for example, RawPositionsFeaturizer, is a mdtraj.Trajectory object.

I've made this rather hacky attempt at a fix:

        tmp_params = clone(estimator).set_params(**params).get_params()
        params = {}
        for k, v in iteritems(tmp_params):
            if not isinstance(v, (BaseEstimator, OrderedDict, np.ndarray)) and (k != 'steps'):
                try:
                    v = v.item()
                    params[k] = v
                except AttributeError:
                    params[k] = v

Which seems to work. But perhaps there's a better solution. Maybe cast all data types to native python types before they get put in the estimator and then just have a simple call to a function which generically checks whether the object is serializable:

def is_json_serializable(obj):
    try:
        json.dumps(obj)
        return True
   except ValueError:
        return False

This pertains to at least the following issues:

https://github.com/msmbuilder/osprey/issues/221

https://github.com/msmbuilder/osprey/issues/218

cxhernandez commented 7 years ago

As mentioned in #221, I like the idea of adding an is_json_serializable method. It should at least remedy cases 1, 3, and 4. We should be able to fix case 2 by casting jump variables to int from the get-go.

Thanks again for the thorough debugging!

cxhernandez commented 7 years ago

done in #223 and #224