Closed CoinCheung closed 4 years ago
Are you sure about your kernel function LSRLossForward
? It seems that sdata
is never used, and nothing is done if tid
is not 0. Not sure this is the root of your problem, but it's worth a try
Yes, my original logic needs shared memory so I left it there unused, and so it is with the tid problem. However, these two places seem to make code and logic simpler, and after removing them the problem still seems to exist.
I solved the problem, it is some problem with my implementation. I am closing this. Thanks for support !!!
Hi,
I am not sure whether this is a bug or problem with my implementation. I am working on ubuntu1604 docker container, and ananconda python3.6.9 and pytorch1.3.1. Since I am not sure where the problem exactly exists, my sample code is a bit long, though I have tried my best to remove useless logic. The code is a softmax cross entropy loss, like this:
The cuda code is like this:
My problem is that, the difference of gradient and loss between the cuda extension implementation and the nn.CrossEntropyLoss gets too big after about 10 iters. Though the gradient of the cuda implementation of the first iter is same with nn.CrossEntropyLoss. How could I solve this problem please?