minerva-ml / minerva-training-materials

Learn advanced data science on real-life, curated problems
https://neptune.ml/minerva
MIT License
48 stars 14 forks source link

`--train_mode False` in whales doesn't work #21

Closed buus2 closed 6 years ago

buus2 commented 6 years ago

For

python run_minerva.py -- dry_run --problem whales --sub_problem localization --train_mode False

I obtain

2018-01-18 13-34-59 minerva >>> starting experiment...
2018-01-18 13-35-01 minerva >>> running: localization
neptune: Executing in Offline Mode.
2018-01-18 13-35-01 minerva >>> step localizer_loader unpacking inputs
2018-01-18 13-35-01 minerva >>> step localizer_loader loading...
2018-01-18 13-35-01 minerva >>> step localizer_loader transforming...
2018-01-18 13-35-01 minerva >>> step localizer_network unpacking inputs
2018-01-18 13-35-01 minerva >>> initializing model weights...
2018-01-18 13-35-01 minerva >>> starting training...
2018-01-18 13-35-01 minerva >>> initial lr: 0.0005
2018-01-18 13-35-01 minerva >>> epoch 0 ...
2018-01-18 13-35-18 minerva >>> epoch 0 batch 0 ...
/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/torch/nn/modules/container.py:67: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  input = module(input)
2018-01-18 13-35-26 minerva >>> epoch 0 batch 0 loss:     4.85479
2018-01-18 13-35-26 minerva >>> epoch 0 batch 0 accuracy: 0.00000
2018-01-18 13-35-26 minerva >>> epoch 0 average batch time: 0:00:08.0
(...) [ANALOGOUS STUFF FOR BATCHES 1-14]
/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/torch/nn/modules/container.py:67: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  input = module(input)
2018-01-18 13-36-04 minerva >>> epoch 0 batch 15 loss:     4.70346
2018-01-18 13-36-04 minerva >>> epoch 0 batch 15 accuracy: 0.00000
2018-01-18 13-36-04 minerva >>> epoch 0 model saved to output/path_to_your_solution/checkpoints/localizer_network/model_epoch0.torch
2018-01-18 13-36-04 minerva >>> epoch 1 current lr: 0.0005
2018-01-18 13-36-04 minerva >>> epoch 0 loss:     4.78818
2018-01-18 13-36-04 minerva >>> epoch 0 accuracy: 0.02295
Traceback (most recent call last):
  File "run_minerva.py", line 46, in <module>
    action()
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva_venv/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "run_minerva.py", line 27, in dry_run
    pm.dry_run(sub_problem, train_mode, dev_mode, cloud_mode)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/whales/problem_manager.py", line 25, in dry_run
    _evaluate(trainer, sub_problem)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/whales/problem_manager.py", line 49, in _evaluate
    score_valid, score_test = trainer.evaluate()
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/trainer.py", line 22, in evaluate
    score_valid = self._evaluate(X_valid, y_valid)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/whales/trainer.py", line 68, in _evaluate
    'train_mode': False,
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 102, in transform
    step_inputs[input_step.name] = input_step.fit_transform(data)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 68, in fit_transform
    step_inputs[input_step.name] = input_step.fit_transform(data)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 74, in fit_transform
    step_output_data = self._cached_fit_transform(step_inputs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 84, in _cached_fit_transform
    step_output_data = self.transformer.fit_transform(**step_inputs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/base.py", line 206, in fit_transform
    self.fit(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/pytorch/models.py", line 61, in fit
    self.callbacks.on_epoch_end()
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/pytorch/callbacks.py", line 86, in on_epoch_end
    callback.on_epoch_end(*args, **kwargs)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/pytorch/callbacks.py", line 154, in on_epoch_end
    val_loss, val_acc = score_model_multi_output(self.model, self.loss_function, self.validation_datagen)
  File "/home/patryk/Documents/edukacyjne/Minerva/0401/minerva/minerva/backend/models/pytorch/validation.py", line 50, in score_model_multi_output
    for batch_id, data in enumerate(batch_gen):
TypeError: 'NoneType' object is not iterable
Sentry is attempting to send 1 pending error messages
Waiting up to 10 seconds
Press Ctrl-C to quit

The same error arises when I use Neptune cloud. Everything works with default --train_mode True.

jakubczakon commented 6 years ago

One needs to use submit or dry_run train_mode=False when the specified solution_dir contains all trained transformers.

I added an exception with clear message to let use know about this quickly. I am waiting for PR to merge.

kamil-kaczmarek commented 6 years ago

@buus2 merged with added error handling for train_mode=False and other related errors