loss is nan at the begining

mrlooi / rotated_maskrcnn

Rotated Mask R-CNN: From Bounding Boxes to Rotated Bounding Boxes

MIT License

350 stars 62 forks source link

loss is nan at the begining #16

Open ustczhouyu opened 4 years ago

ustczhouyu commented 4 years ago

❓ Questions and Help

help!! when I train my own dataset, the loss is nan at the begining, can anybody tell me how to deal with it? thanks a lot!! @mrlooi

mrlooi commented 4 years ago

That's odd, but without more info I can't really provide more help. Have you resolved it?

HashiamKadhim commented 4 years ago

Are you working with a single GPU? If so did you decrease the batch size so that the batch fits into GPU memory? If yes to both:

Set the SOLVER.BASE_LR in your model_config.yaml file about an order of magnitude lower (for example, set it to 0.0025).

Having a larger batch size gives you stability allowing you to increase learning rate. When batch size goes down, a good rule of thumb is that the learning rate should go down as well.

ustczhouyu commented 4 years ago

@HashiamKadhim @mrlooi Thank you very much, when i set the lr to 0.005, it works. But when I train the model on a dataset containing many small objects, I encountered other difficulties. 1. The model will detect two or more small objects that are close together in the horizontal or vertical direction as one. 2. Due to the complex background of this dataset, some backgrounds are even similar to the texture of the foreground, leading to some false positives. What should I do to solve these two problems? (For example, which parameters should be modified or what kind of branch should be added?) Please help me.

Johnsyisme commented 1 year ago

Hi！ I also came into the same issue, I did some tests and the grad is always nan tmp