Open eeeric-code opened 11 hours ago
Hi thank you for your question. Is this for the pretrained checkpoint, or are you retraining the model?
The results for the pretrained checkpoint should be identical to the paper results
I have retrained the model
With the retrained checkpoint, the results are close to paper's results when sam tt-norm=True, crop=True, remove-bad-exemplar=True. But results are different from paper when sam tt-norm=False, crop=False, remove-bad-exemplar=False. That's weird.
Ah okay, got it. I have not released the training code yet, so I am not able to reproduce your results on my side. I, however, can still speculate. This may be due to a high variance caused by a couple examples in the test set with very high counts driving up the rmse. You can check this by omitting examples with greater than 900 objects when you calculate the test error. If this results in a significant improvement to the error, then this is probably the issue. To improve the robustness of the training code to this issue, you could apply the adaptive cropping to the early stopping code so that when the validation set error is evaluated during early stopping, adaptive cropping is applied. Right now, there is some other source of non-determinism, other than the seed, in the posted code, and adaptive cropping is not being applied during early stopping.
Thanks a lot! I will try it.
RE: With the retrained checkpoint, the results are close to paper's results when sam tt-norm=True, crop=True, remove-bad-exemplar=True. But results are different from paper when sam tt-norm=False, crop=False, remove-bad-exemplar=False. That's weird.
The main results in the paper have sam_tt_norm=True and remove_bad_exemplar=True, so it is not weird. We report results without these options in the appendix pasted below:
The influence of these options is described in the appendix here:
RE: With the retrained checkpoint, the results are close to paper's results when sam tt-norm=True, crop=True, remove-bad-exemplar=True. But results are different from paper when sam tt-norm=False, crop=False, remove-bad-exemplar=False. That's weird.
The main results in the papper have sam_tt_norm=True and remove_bad_exemplar=True, so it is not weird. We report results without these options in the appendix pasted below:
Sorry, I make a mistake in my previous response. It should be: With the retrained checkpoint, the results are close to paper's results when sam tt-norm=False, crop=False, remove-bad-exemplar=False. But results are different from paper when sam tt-norm=True, crop=True, remove-bad-exemplar=True.
The influence of these options is described in the appendix here:
I have noticed that and conducted some ablation study, but still can not find the factors affecting the results
Yes, so try turning on these parameters during the early stopping procedure to improve the robustness of the method to these settings. To reduce the variance of the method in general, look at other sources of non-determinism in the code (other than the seed) using this link: https://pytorch.org/docs/stable/notes/randomness.html, and remove them as much as possible.
Thanks! Let me check.
Hi, it is a great work! When reproducing this project, I am able to achieve performance close to that reported in the paper on the FSC147 val set. However, when testing on the FSC147 test set using the same checkpoint, we obtain MAE≈11 and RMSE≈100 (sam tt-norm=False, crop=False, remove-bad-exemplar=False), which is close to the paper's results MAE=10.92 and RMSE=99.58. However, we only observe MAE=7+ and RMSE=80+ (sam tt-norm=True, crop=True, remove-bad-exemplar=True), which is different from the paper's results MAE=5.74 and RMSE=24.09. Could you please advise on the potential reasons for this discrepancy? Despite some of the reproduction results are consistent.