msmbuilder / osprey

🦅Hyperparameter optimization for machine learning pipelines 🦅
http://msmbuilder.org/osprey
Apache License 2.0
74 stars 26 forks source link

Problem loading Pipeline as a pkl file #218

Closed jeiros closed 7 years ago

jeiros commented 7 years ago

From what I understood in the docs, I can load an arbitrary model in osprey as a pkl file. I've done it like so:

pipeline = Pipeline([
    ('feat', AtomPairsFeaturizer(atom_pairs)),
    ('scale', RobustScaler()),
    ('tica', tICA(n_components=3)),
    ('cluster', KMeans()),
    ('msm', MarkovStateModel(lag_time=100, n_timescales=5))
])
dump(pipeline, 'osprey/pipe.pkl')

Where atom_pairs is previously defined as a np.array of shape (419,2). Now I don't know if I'm properly using the config file, but here is what I came up with:

estimator:
    pickle: pipe.pkl

strategy:
    name: random  # or gp, hyperopt_tpe

search_space:
  cluster__n_clusters:
    min: 5
    max: 1000
    type: int

  tica__lag_time:
    min: 1
    max: 200
    type: int

cv:
  name: shufflesplit

dataset_loader:
  name: mdtraj
  params:
    trajectories: ../filtered/trajs/*.nc
    topology: ../filtered/striped.prmtop
    stride: 1

trials:
  uri: sqlite:///osprey-trials.db

Running osprey worker config.yaml works fine, with the following output:

======================================================================
= osprey is a tool for machine learning hyperparameter optimization. =
======================================================================

osprey version:      1.1.0
time:                April 21, 2017  3:08 PM
hostname:            ch-igould-titanx2.ch.ic.ac.uk
cwd:                 /home/je714/ligand_binding/G159D_P/SilybinA/MD3/S1P/adaptive/osprey
pid:                 17670

Loading config file:     config.yaml...

msmbuilder version:  3.7.0
mdtraj version:      1.8.0

Loading dataset...

Dataset contains 11 element(s) with out labels
The elements have shape: [(1250,), (1250,), (1250,), (1250,), (1250,), (1250,), (1250,), (1250,), (1250,), (1250,), (1250,)]
Instantiated estimator:
  Pipeline(steps=[('feat', AtomPairsFeaturizer(exponent=1.0,
          pair_indices=array([[6825,    4],
       [6825,   21],
       ...,
       [6825, 6789],
       [6825, 6813]]),
          periodic=False)), ('scale', RobustScaler(copy=True, quantile_range=(25.0, 75.0), with_centering=True,
       with_scali...les=5,
         prior_counts=0, reversible_type='mle', sliding_window=True,
         verbose=True))])
Hyperparameter search space:
  cluster__n_clusters       (int)          5 <= x <= 1000
  tica__lag_time            (int)          1 <= x <= 200

----------------------------------------------------------------------
Beginning iteration                                              1 / 1
----------------------------------------------------------------------
Loading trials database: sqlite:///osprey-trials.db...
History contains: 0 trials
Choosing next hyperparameters with random...
  {'cluster__n_clusters': 268, 'tica__lag_time': 114}
(random took 0.000 s)

Until it crashes with a long traceback.

The long traceback is coming from sqlalchemy, but I presume there's something wrong with how I specify the config file. Thanks for any help/comments!

``` An unexpected error has occurred with osprey (version 1.1.0), please consider sending the following traceback to the osprey GitHub issue tracker at: https://github.com/pandegroup/osprey/issues Traceback (most recent call last): File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1116, in _execute_context context = constructor(dialect, self, conn, *args) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 623, in _init_compiled param.append(processors[key](compiled_params[key])) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py", line 1078, in process return process_param(value, dialect) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/trials.py", line 26, in process_bind_param value = json.dumps(value) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 180, in default o.__class__.__name__) TypeError: Object of type 'ndarray' is not JSON serializable The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/je714/anaconda3/envs/msmbuilder/bin/osprey", line 6, in sys.exit(osprey.cli.main.main()) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/cli/main.py", line 37, in main args_func(args, p) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/cli/main.py", line 42, in args_func args.func(args, p) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/cli/parser_worker.py", line 8, in func execute(args, parser) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/execute_worker.py", line 81, in execute project_name=project_name, sessionbuilder=config.trialscontext) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/execute_worker.py", line 123, in initialize_trial session.commit() File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 874, in commit self.transaction.commit() File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 461, in commit self._prepare_impl() File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 441, in _prepare_impl self.session.flush() File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2139, in flush self._flush(objects) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2259, in _flush transaction.rollback(_capture_exception=True) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise raise value File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2223, in _flush flush_context.execute() File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 389, in execute rec.execute(self) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 548, in execute uow File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj mapper, table, insert) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 835, in _emit_insert_statements execute(statement, params) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 945, in execute return meth(self, multiparams, params) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement compiled_sql, distilled_params File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1121, in _execute_context None, None) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception exc_info File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value.with_traceback(tb) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1116, in _execute_context context = constructor(dialect, self, conn, *args) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 623, in _init_compiled param.append(processors[key](compiled_params[key])) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py", line 1078, in process return process_param(value, dialect) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/site-packages/osprey/trials.py", line 26, in process_bind_param value = json.dumps(value) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/home/je714/anaconda3/envs/msmbuilder/lib/python3.6/json/encoder.py", line 180, in default o.__class__.__name__) sqlalchemy.exc.StatementError: (builtins.TypeError) Object of type 'ndarray' is not JSON serializable [SQL: 'INSERT INTO trials_v3 (project_name, status, parameters, mean_test_score, mean_train_score, train_scores, test_scores, n_train_samples, n_test_samples, started, completed, elapsed, host, user, traceback, config_sha1) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: [{'host': 'ch-igould-titanx2.ch.ic.ac.uk', 'status': 'PENDING', 'started': datetime.datetime(2017, 4, 21, 15, 8, 8, 415322), 'config_sha1': '05f1621576ee9b088120cdcfe199455b6a745b6b', 'parameters': {'feat__exponent': 1.0, 'feat__pair_indices': array([[6825, 4], [6825, 21], [6825, 33], [6825, 45], [6825, 64], ... (9312 characters truncated) ... msm__lag_time': 100, 'msm__n_timescales': 5, 'msm__prior_counts': 0, 'msm__reversible_type': 'mle', 'msm__sliding_window': True, 'msm__verbose': True}, 'user': 'je714', 'project_name': 'default', 'mean_test_score': None, 'mean_train_score': None, 'train_scores': None, 'traceback': None, 'completed': None, 'n_train_samples': None, 'test_scores': None, 'elapsed': None, 'n_test_samples': None}]] ``` <\details>
cxhernandez commented 7 years ago

Not sure what's going on, but I'll take a look into this!

RobertArbon commented 7 years ago

I'm pretty sure this is the same problem as my problem . It's trying to serialize the atom pair ndarray as a parameter.

cxhernandez commented 7 years ago

done in #223