I found the "--cost-scaling" option in src/common/config_parser.cpp, but I can't find the corresponding implement for this.
So, does marian supports "Dynamic cost scaling for mixed precision training"? If not, how does it handle the nan problem of loss when optimization use float16?
Thank you!
I found the "--cost-scaling" option in src/common/config_parser.cpp, but I can't find the corresponding implement for this. So, does marian supports "Dynamic cost scaling for mixed precision training"? If not, how does it handle the nan problem of loss when optimization use float16? Thank you!