the-full-stack / fsdl-text-recognizer-2021-labs

Complete deep learning project developed in Full Stack Deep Learning, Spring 2021
https://bit.ly/berkeleyfsdl
MIT License
452 stars 281 forks source link

on colab pytorch_lightning v1.2 throws valueerror when following setup step !python training/run_experiment.py --max_epochs=3 #9

Closed ravindrabharathi closed 3 years ago

ravindrabharathi commented 3 years ago

while following the setup steps for colab (https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2021-labs/blob/main/setup/readme.md) , install pytorch_lightning step gets the latest v1.2 . This version results in the following Error when trying !python training/run_experiment.py --max_epochs=3

If pytorch_lightning 1.1.8 is used (!pip install pytorch_lightning==1.1.8) , the test step works without issues as shown in the image in readme

I haven't explored further to check what might be causing the issue between the two versions (or if it is already a known issue )

Links to colab notebooks with pytorch-lightning v1.2 and v1.1.8 v1.2 : https://colab.research.google.com/drive/1DvfGtym_oZRg2q5R78gWm6997LEZj4Ma?usp=sharing v1.1.8 : https://colab.research.google.com/drive/1DBjpKEMTJ9w6U3rNltLcHsw976AvNX9j?usp=sharing


  File "training/run_experiment.py", line 90, in <module>
    main()
  File "training/run_experiment.py", line 85, in main
    trainer.fit(lit_model, datamodule=data)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 513, in fit
    self.dispatch()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 553, in dispatch
    self.accelerator.start_training(self)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 111, in start_training
    self._results = trainer.run_train()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 614, in run_train
    self.run_sanity_check(self.lightning_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 863, in run_sanity_check
    _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 732, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 164, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 178, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 128, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/content/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 61, in validation_step
    self.val_acc(logits, y)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/metric.py", line 152, in forward
    self.update(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/metric.py", line 199, in wrapped_func
    return update(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 139, in update
    preds, target, threshold=self.threshold, top_k=self.top_k, subset_accuracy=self.subset_accuracy
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/functional/accuracy.py", line 25, in _accuracy_update
    preds, target, mode = _input_format_classification(preds, target, threshold=threshold, top_k=top_k)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 439, in _input_format_classification
    top_k=top_k,
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 296, in _check_classification_inputs
    _basic_input_validation(preds, target, threshold, is_multiclass)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 74, in _basic_input_validation
    raise ValueError("The `preds` should be probabilities, but values were detected outside of [0,1] range.")
ValueError: The `preds` should be probabilities, but values were detected outside of [0,1] range.```
AlexHandy1 commented 3 years ago

+1

wayfarerjing commented 3 years ago

Same here. Looks like there's a compatibility issue with PL 1.2 >= 1.2: https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/551

Daniel8hen commented 3 years ago

+1

numanai commented 3 years ago

+1

Tianqiao-Yvonne commented 3 years ago

+1

sergeyk commented 3 years ago

Thanks for the reports and the fix! Pushed to main branch, closing.