Hi, I found that u said there is a "sudden converge"
I think that because of the uniform sampling of timestep in training.
If use (large timestep, 1000) in the start, and it decay to the (0, 1000) while training
the sudden converge is coming fast with small batch, such as 4.
Hi, I found that u said there is a "sudden converge" I think that because of the uniform sampling of timestep in training. If use (large timestep, 1000) in the start, and it decay to the (0, 1000) while training the sudden converge is coming fast with small batch, such as 4.