RuntimeError: value cannot be converted to type float without overflow

Hi, thanks for your great work! I've tried run the train script to reproduce the result and encountered an error at step 40000th (i.e. the last step).

INFO     [val 512,512] mean_IU:0.685917  IU_array:[0.96660748 0.75127033 0.89316559 0.47657077 0.52247761 0.55418666
 0.61232051 0.69269029 0.89863761 0.62701722 0.91556037 0.746619
 0.50167771 0.92389765 0.5565986  0.66125612 0.67333135 0.35126003
 0.70728074]
INFO     step:40000 G_lr:0.000000 G_loss:218.70508(mc:0.19767 pixelwise:217.88451 pairwise:0.00256) D_lr:0.000000 D_loss:0.06917
Traceback (most recent call last):
  File "train_and_eval.py", line 31, in <module>
    model.optimize_parameters()
  File "/segmentation/structure_knowledge_distillation/networks/kd_model.py", line 171, in optimize_parameters
    self.G_solver.step()
  File "/anaconda3/lib/python3.6/site-packages/torch/optim/sgd.py", line 106, in step
    p.data.add_(-group['lr'], d_p)
RuntimeError: value cannot be converted to type float without overflow: (6.86045e-07,-2.22909e-07)

I made two changes to the train scripts. One is to train the student model without loading ImageNet pre-trained weight. The other is to import InPlaceABN directly from inplace_abn package instead of libs directory, to make this project compatible with PyTorch v1.0 and above. Here is the edited shell script:

is_pi_use=True
is_pa_use=True
is_ho_use=True
lambda_pi=10.0
lambda_d=0.1

# start kd from 0 step with loading the pretrain imgnet model on student 
CUDA_VISIBLE_DEVICES='3' python -m torch.distributed.launch --nproc_per_node 1 train_and_eval.py \
    --gpu 0 \
    --parallel False \
    --random-mirror \
    --random-scale \
    --weight-decay 5e-4 \
    --data-dir '/Datasets/cityscapes' \
    --batch-size 8 \
    --num-steps 40000 \
    --is-student-load-imgnet False \
        --S_resume False \
        --T_ckpt_path 'Teacher_city.pth' \
    --student-pretrain-model-imgnet ./dataset/resnet18-imagenet.pth \
    --pi ${is_pi_use} \
    --pa ${is_pa_use} \
    --ho ${is_ho_use} \
    --lambda-pa 0.5 \
    --pool-scale 0.5 \
    --lambda-pi ${lambda_pi} \
    --lambda-d ${lambda_d} \

Could you help me to solve this problem? Thanks!

irfanICMLL / structure_knowledge_distillation

RuntimeError: value cannot be converted to type float without overflow #10