Dear author, thanks for the great work. When I want to try to train detr model weights using this issue #3, I find that the output of the transformer is always nan and the training process cannot run due to the nan value. Is this caused by the half float in transformer? How to solve this? Thanks for your work, looking forward to your reply.
Dear author, thanks for the great work. When I want to try to train detr model weights using this issue #3, I find that the output of the transformer is always nan and the training process cannot run due to the nan value. Is this caused by the half float in transformer? How to solve this? Thanks for your work, looking forward to your reply.