Closed soapmactavish-byf closed 3 weeks ago
Thanks for your interest in this work. The initial query points can only learn a rough geometric distribution of the entire dataset, which is underfitting, because they are significantly fewer than the gt point number. The case of local optima you mentioned only need to be considered when the initial query points are more than gt points.
When I was training the tiny model, I found that init_points_loss converges after a few epochs and no longer decreases. I guess this is the characteristic of CD loss between 600 query points to many gt points(2w+), and I am worried that this will cause the model to fall into local optima from the beginning。