The "correct" code ought to be to simply add the original local threshold to the current threshold so that each update gets trained on the same number of iters. When I do this the error blows up. Instead, if I leave it such that each update doubles the previous number of iters, then the loss is somewhat contained, but doesn't really drop between updates and certainly not within updates
The "correct" code ought to be to simply add the original local threshold to the current threshold so that each update gets trained on the same number of iters. When I do this the error blows up. Instead, if I leave it such that each update doubles the previous number of iters, then the loss is somewhat contained, but doesn't really drop between updates and certainly not within updates