minerva-ml / minerva-training-materials

Learn advanced data science on real-life, curated problems
https://neptune.ml/minerva
MIT License
48 stars 14 forks source link

--train_model False raises an error #9

Closed buus2 closed 6 years ago

buus2 commented 6 years ago
python run_minerva.py -- dry_run --problem fashion_mnist

works whereas

python run_minerva.py -- dry_run --problem fashion_mnist --train_mode False

raises an error:

~/Documents/edukacyjne/Minerva/0401/minerva$ python run_minerva.py -- dry_run --problem fashion_mnist --train_mode False
2018-01-11 13-08-12 minerva >>> starting experiment...
Using TensorFlow backend.
2018-01-11 13-08-14 minerva >>> running: None
neptune: Executing in Offline Mode.
2018-01-11 13-08-14 minerva >>> Saving graph to path/to/your/solution/class_predictions_graph.json
2018-01-11 13-08-14 minerva >>> step input unpacking inputs
2018-01-11 13-08-14 minerva >>> step input loading...
2018-01-11 13-08-14 minerva >>> step input transforming...
2018-01-11 13-08-14 minerva >>> step keras_model unpacking inputs
Epoch 1/200
2018-01-11 13:08:15.268968: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 13:08:15.269056: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 13:08:15.269094: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 13:08:15.269123: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-11 13:08:15.269150: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
46/47 [============================>.] - ETA: 1s - loss: 0.4396 - acc: 0.9772/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/keras/callbacks.py:494: RuntimeWarning: Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: acc,loss
  (self.monitor, ','.join(list(logs.keys()))), RuntimeWarning
Traceback (most recent call last):
  File "run_minerva.py", line 46, in <module>
    action()
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "run_minerva.py", line 27, in dry_run
    pm.dry_run(sub_problem, train_mode, dev_mode, cloud_mode)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/fashion_mnist/problem_manager.py", line 16, in dry_run
    _evaluate(trainer)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/fashion_mnist/problem_manager.py", line 39, in _evaluate
    score_valid, score_test = trainer.evaluate()
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/trainer.py", line 22, in evaluate
    score_valid = self._evaluate(X_valid, y_valid)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/fashion_mnist/trainer.py", line 29, in _evaluate
    'inference': True}})
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 102, in transform
    step_inputs[input_step.name] = input_step.fit_transform(data)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 74, in fit_transform
    step_output_data = self._cached_fit_transform(step_inputs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 84, in _cached_fit_transform
    step_output_data = self.transformer.fit_transform(**step_inputs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 206, in fit_transform
    self.fit(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/keras/models_keras.py", line 28, in fit
    **self.training_config)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/keras/engine/training.py", line 2187, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/keras/callbacks.py", line 73, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/keras/callbacks_keras.py", line 19, in on_epoch_end
    self.ctx.channel_send('Log-loss validation', self.epoch_id, logs['val_loss'])
KeyError: 'val_loss'
Sentry is attempting to send 1 pending error messages
Waiting up to 10 seconds
Press Ctrl-C to quit
kamil-kaczmarek commented 6 years ago

Hi @buus2 Thank you. I will take a closer look at it.

jakubczakon commented 6 years ago

The problem is that in order to run dry_run with train_mode=False you need to have a trained pipeline. So if you are pointing an empty directory in the config.yaml

parameters: solution_dir: my/dir you have no pipeline there. This means that running first dry_run train_mode=True and then again with train_mode=False will work while train_mode=False first will not.

Moreover to run any task in fashion_mnist you need to run the dry_run first (this will be changed soon).