ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.22k stars 3.45k forks source link

WARNING: nan loss detected, ending training #496

Closed broliao closed 5 years ago

broliao commented 5 years ago

thank you for your great work. when i run python3 train.py --transfer ,i get this erro,Could you tell me how can i solve it,thanks!

Namespace(accumulate=1, arc='default', batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov3-spp.cfg', data='data/coco.data', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, nosave=False, notest=False, prebias=False, rect=False, resume=False, transfer=True, weights='') Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1060 6GB', total_memory=6078MB)

Reading labels (117263 found, 0 missing, 0 empty for 117263 images): 100%|██████████| 117263/117263 [10:34<00:00, 184.85it/s] Model Summary: 225 layers, 6.29987e+07 parameters, 457725 gradients

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
 0/272     2.29G      4.21      27.8      6.75      38.8        94       416:   1%|▏         | 102/7329 [00:44<37:37,  3.20it/s]Exception ignored in: <bound method tqdm.__del__ of      0/272     2.29G      4.21      27.8      6.75      38.8        94       416:   1%|▏         | 102/7329 [00:44<37:37,  3.20it/s]>

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 931, in del WARNING: nan loss detected, ending training

glenn-jocher commented 5 years ago

@broliao your training loss diverged. I see you also did not specify any model to transfer learn from, so you are transfer learning from a randomly initialized model, which is not the intended use case.

weights=''

broliao commented 5 years ago

@glenn-jocher I get it.thanks a lot.so Should I do this to do transfer learn?

parser.add_argument('--weights', type=str, default='weights/darknet53.conv.74', help='initial weights')

glenn-jocher commented 5 years ago

@broliao ah no, you want to transfer learn from a fully trained model:

python3 train.py --weights weights/yolov3-spp.weights --transfer

or train from the darknet53 backbone:

python3 train.py --weights weights/darknet53.conv.74
Muxindawang commented 4 years ago

@broliao ah no, you want to transfer learn from a fully trained model:

python3 train.py --weights weights/yolov3-spp.weights --transfer

or train from the darknet53 backbone:

python3 train.py --weights weights/darknet53.conv.74

I have a question. I want to pretrained on ImageNet and save the weights as .pth.When I run with python train.py --weights weights/best.pth . It show "unrecognized arguments" . How can I solve it thanks

glenn-jocher commented 9 months ago

@Muxindawang, it seems there might be a misunderstanding with the command syntax. The --weights flag should work without issue. Make sure there are no typos and that the path to the weights file is correct. If best.pth is indeed the file you want to use, the command should be:

python3 train.py --weights best.pth

Ensure that best.pth is in the correct directory or provide the relative or absolute path to the file. If the problem persists, please check for any additional unrecognized arguments that might have been inadvertently included in your command.