Open zhaozhaoooo opened 5 months ago
Hi, and sorry for the late reply. Thank you for the question. Unfortunately, we have not encountered a similar problem in training. There could be a variety of causes and should be investigated. Certainly a "simple" task could be easily solved by only the initial predictor, effectively making the use of the diffusion refiner unnecessary.
Hello, the work of residuals is really interesting, but I found that when the rough generation effect in the first stage is good, it will lead to very small residuals. Then, during the training process, after about 1000 iterations, the model predicts to be all 0. How to solve this problem? Can we consider reducing the timestep a bit?