Closed luomingshuang closed 2 years ago
When I use clip_gradnorm to replace clip_gradvalue, the bug about the score difference doesn't happen. So maybe the PR about removing some line codes in k2/csrc/intersect_dense.cu is not necessary. And I also get a better result than clip_gradvalue. I think this PR is ready to merge.
The result based on clip_gradnorm:
This PR is ready to merge.
@danpovey , I have reproduced the result based on the new mmibigram{train, decode}.py with the latest snowfall.
About the following bug, with kangwei' help, my solution is to remove some lines code in k2/k2/csrc/intersect_dense.cu and get a new k2 to training the script. The following bug:
My solution:
So, I also make a PR for k2 to remove the above lines.