microsoft / DynamicHead

MIT License
624 stars 60 forks source link

Model train execution fails with a PyTorch error message. #29

Closed shivamsnaik closed 2 years ago

shivamsnaik commented 2 years ago

Hello, I would like to ask if anyone is facing the below issue: TypeError: add(): argument 'alpha' must be Number, not NoneType

The steps I followed are:

  1. python -m pip install -e DynamicHead.
  2. Added custom dataset using register_coco_instance
  3. Update config in def setup(args) with custom dataset name and the number of classes:
    
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 13```
  4. Run the model: python train_net.py --config configs/dyhead_r50_rcnn_fpn_1x.yaml --num-gpus 1.

The detailed error message goes like this:

File "train_net.py", line 204, in <module> 
    launch(
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 198, in main
    return trainer.train()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/opt/conda/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 294, in run_step
    self.optimizer.step()
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/sgd.py", line 136, in step
    F.sgd(params_with_grad,
  File "/opt/conda/lib/python3.8/site-packages/torch/optim/_functional.py", line 164, in sgd
    d_p = d_p.add(param, alpha=weight_decay)
TypeError: add(): argument 'alpha' must be Number, not NoneType

Environment Details: sys.platform = linux Python = 3.8.12 numpy = 1.21.2 detectron2 = 0.6 CUDA = 11.4 PyTorch = 1.10.0 torchvision = 0.11.0a0 fvcore = 0.1.5.post20211023 iopath = 0.1.9 cv2 = 4.5.4

Kindly request for assistance if you are aware of the solution.

MajorityRreport commented 2 years ago

Hi,I have the same problem as you. Have you solved it?

Houseqin commented 2 years ago

Hi,I have the same problem as you. Have you solved it?

i have the same problem, have you solved it?

shivamsnaik commented 2 years ago

@Houseqin @MajorityRreport Hi. Yes I did solve the issue.

The weight decay is not passed to Pytorch due to which the above error occurs. Add the following lines to your config YAML file to solve this issue: cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>

This was causing the issue in my case. Also, if cfg.SOLVER.WEIGHT_DECAY is missing in your config, do include that.

It would surely solve the issue.

MajorityRreport commented 2 years ago

@Houseqin @MajorityRreport Hi. Yes I did solve the issue.

The weight decay is not passed to Pytorch due to which the above error occurs. Add the following lines to your config YAML file to solve this issue: cfg.SOLVER.WEIGHT_DECAY_BIAS = <desired_value>

This was causing the issue in my case. Also, if cfg.SOLVER.WEIGHT_DECAY is missing in your config, do include that.

It would surely solve the issue.

Thank you very much!!

shivamsnaik commented 2 years ago

@MajorityRreport Glad to help. Let me know if it still throws the same error.

Houseqin commented 2 years ago

@MajorityRreport Glad to help. Let me know if it still throws the same error.

it also has the same problem with the official config YAML dyhead_swint_atss_fpn_2x_ms.yaml,although with adding cfg.SOLVER.WEIGHT_DECAY_BIAS and cfg.SOLVER.WEIGHT_DECAY

shivamsnaik commented 2 years ago

@Houseqin I have never seen this error before. Is it hindering with normal operation?.

If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description. It works perfectly for me after the mentioned changes.

Houseqin commented 2 years ago

@Houseqin I have never seen this error before. Is it hindering with normal operation?.

If yes, you can try replicating my versions for CUDA, Pytorch, etc that Ive mentioned in the issue description. It works perfectly for me after the mentioned changes.

I have found it a compile problem, and fix the issue "nvcc not found" or "Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available" according to https://detectron2.readthedocs.io/en/latest/tutorials/install.html#common-installation-issues

Thank you most sincerely.

Edwardmark commented 2 years ago

cfg.SOLVER.WEIGHT_DECAY_BIAS and cfg.SOLVER.WEIGHT_DECAY

hi could you please tell me what value shoud be set to this decay? Thanks.

MajorityRreport commented 2 years ago

WEIGHT_DECAY: 0.05 WEIGHT_DECAY_BIAS: 0.01 Sorry,I'm just a beginner. I set it like this(a random number),and the network could run .But maybe the network didn't match my data set, so it didn't work well. I hope it helps