kaylode / theseus

General template for most Pytorch projects
MIT License
34 stars 6 forks source link

Possible issue with trainer #37

Closed lannguyen0910 closed 2 years ago

lannguyen0910 commented 2 years ago

Hi @kaylode, if you have time, could you update the notebooks (the config part)? When tested, i encountered this error:

Traceback (most recent call last):
  File "/content/main/configs/classification/train.py", line 10, in <module>
    train_pipeline.fit()
  File "/content/main/theseus/base/pipeline.py", line 237, in fit
    self.trainer.fit()
  File "/content/main/theseus/base/trainer/base_trainer.py", line 71, in fit
    self.training_epoch()
  File "/content/main/theseus/base/trainer/supervised_trainer.py", line 83, in training_epoch
    self.scaler(loss, self.optimizer)
TypeError: 'bool' object is not callable

So i think it might be problem with the scaler, after changing use_fp16 to True as default in BaseTrainer. It's runnable, like this:

class BaseTrainer():
def __init__(self,
                use_fp16: bool = True, 
                ...
                ):

It doesn't work albeit i've already set the global use_fp16 variable to True.

global:
  exp_name: null
  exist_ok: false
  debug: true
  cfg_transform: configs/classification/transform.yaml
  save_dir: /content/main/runs
  device: cuda:0
  use_fp16: true
  pretrained: null
  resume: null

So i think it might be a possible issue. I notice that in SupervisedTrainer there isn't any catch for the scaler when we set False use_fp16, therefore it can trigger this error TypeError: 'bool' object is not callable

kaylode commented 2 years ago

Exactly, I have to read usage documentation of torch.cuda.amp.GradScaler again to correct this.