Hi, I run the training code successfully, but after 30 iterations' training it crashes. The error occurs in the loss.backward(), and it raise an error "RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered". I have checked the losses, and it seems normal. Can you help me? Thanks a lot.
Hi, I run the training code successfully, but after 30 iterations' training it crashes. The error occurs in the loss.backward(), and it raise an error "RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered". I have checked the losses, and it seems normal. Can you help me? Thanks a lot.