yjh0410 / yolov2-yolov3_PyTorch

Apache License 2.0
223 stars 53 forks source link

the loss is nan #53

Open lostlose opened 2 years ago

lostlose commented 2 years ago

Hi, I met the same problem in https://github.com/yjh0410/PyTorch_YOLOv3/issues/1 when I train yolov3 in this project.

1023280072 commented 2 years ago

may be the reason is the batch_size is too small When I train yolov2, I let batch_size be 4 or 8, the loss will be nan When I up it to 32, that problem will not happen Hope this useful

yjh0410 commented 2 years ago

@lostlose Please try my another YOLO project: https://github.com/yjh0410/PyTorch_YOLO-Family

lostlose commented 2 years ago

@1023280072 Thank you! Due to memory constraints, I can only set the batch_size to 16, and now I'm trying to adjust the learning rate to get the correct results.

lostlose commented 2 years ago

@yjh0410 OK, thanks, I will try it later!

aknirala commented 1 month ago

It is nan even at batch size 48: I was using RN50 as backbone, and trying to train coco dataset.