NaN Loss - Githubissues

anklebreaker commented 1 year ago

Thanks so much for opening up the v7 training to multiple classes and different numbers of keypoints.

Training goes pretty well on a custom dataset for first few epochs and then reaches nan values for kpt loss. Do you have any suggestions on how to avoid or adjust hyperparameters?

ruiz-manuel commented 1 year ago

Can you share your results.txt? And your hyp and cfg. Also, try running test.py which will plot your bboxes and keypoints, just to check everything is in order

nomaad42 commented 1 year ago

@anklebreaker , I had the same issue.

You can have a look here: https://github.com/ruiz-manuel/yolov7-pose-custom/issues/1#issuecomment-1463645892

I added two lines of code, then it worked.

But I still have the same issue, as after several epochs, performance does not get better. I am trying to check the loss functions right now

anklebreaker commented 1 year ago

I may have found the problem. In the calculation of keypoint loss kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0), there's a possibility that sum of kpt_mask != 0 is 0 causing a divide by 0 error. This is more likely on custom datasets and with mosaic augmentation. Since fixing this (adding a small float to the denominator), I've yet to see any NaNs pop up.

nomaad42 commented 1 year ago

I may have found the problem. In the calculation of keypoint loss kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0), there's a possibility that sum of kpt_mask != 0 is 0 causing a divide by 0 error. This is more likely on custom datasets and with mosaic augmentation. Since fixing this (adding a small float to the denominator), I've yet to see any NaNs pop up.

I think you're right! I removed my 2 lines of code, and added small float to the denominator, it worked nicely!

By the way, what are you using as the "lkpt"?

lkpt += kpt_loss_factor * (torch.log(d + 1 + 1e-9) * kpt_mask).mean()

or

lkpt += kpt_loss_factor*((1 - torch.exp(-d/(2*(s*sigmas)**2+1e-9)))*kpt_mask).mean()

I noticed, @ruiz-manuel's loss works better

anklebreaker commented 1 year ago

I used the first one, but I'm not sure how it was derived since the second one looks like the formula in the paper. In the second one, the sigmas seemed to have been tuned for values that work well with person pose keypoints, so not sure how it would adapt to custom tasks.

I am able to train successfully now on a custom nc=4, nkpt=4 dataset. The boxes look really good but the keypoints are noisier than hoped.

nomaad42 commented 1 year ago

I used the first one, but I'm not sure how it was derived since the second one looks like the formula in the paper. In the second one, the sigmas seemed to have been tuned for values that work well with person pose keypoints, so not sure how it would adapt to custom tasks.

I am able to train successfully now on a custom nc=4, nkpt=4 dataset. The boxes look really good but the keypoints are noisier than hoped.

Yeah, it's true!

Some people recommend tweaking sigma values and or using different formulas, some people suggest the bigger dataset, I don't know which one would really help

nomaad42 commented 1 year ago

@anklebreaker

I used this loss

lkpt += kpt_loss_factor*(((1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9))) + 0.05*d)*kpt_mask).mean()

and this sigmas (I have 10 keypoints)

sigmas = torch.tensor([.71, .73, .88, .77, .76, .79, .79, .72, .72, .87], device=device) / 10.0

It seemed to work a quite fine as well

anklebreaker commented 1 year ago

@nomaad42 Interesting, how did you go about tuning those sigmas?

I've been using the first loss, but after 300 epochs it does learn a lot but it still seems to lack performance. I'm currently trying to let it train longer in case there's long convergence. Did you find that it took long to train or if the second loss is faster?

ruiz-manuel commented 1 year ago

The loss I took from this issue github.com/WongKinYiu/yolov7/pull/501

anklebreaker commented 1 year ago

@ruiz-manuel ah ok, this makes more sense, thanks for the context!

Would definitely be interested in seeing more research into the loss function of keypoint prediction, especially with bounding boxes and arbitrary number of classes/keypoints.

nomaad42 commented 1 year ago

@nomaad42 Interesting, how did you go about tuning those sigmas?

I've been using the first loss, but after 300 epochs it does learn a lot but it still seems to lack performance. I'm currently trying to let it train longer in case there's long convergence. Did you find that it took long to train or if the second loss is faster?

@anklebreaker well, boxes were performing well on both loss functions, I was having problems with keypoint detection performance.

I used was choosing between several losses, and different sigmas -- then I noticed this combination works best among others, and performance nearly same. Sigma values are just the result of tuning and watching the performance.

I know people somehow calculate the sigma values, and I am trying to figure out how they do that. Do you have any ideas?

I have only 50 images in train, and 4 images in val. I train for 50 epochs, well it's quite fine.

How many images in your dataset?

anklebreaker commented 1 year ago

Guess it's part of the tuning. For me, scaling down the keypoint loss has led to more stable training. I've got a custom dataset of 10k images.

nomaad42 commented 1 year ago

@anklebreaker , how did you scale down the keypoint loss? Were you dividing the loss? If yes, for which value?

ruiz-manuel / yolov7-pose-custom

NaN Loss #2