Closed TianpengBu closed 1 year ago
Hi👋, thanks for using our code!
Using EMA or not does make some difference, so I took a look at the source code. The implementation here follows the source code from function in AttentiveNAS, and I have checked the similar approach using in the Once-for-All (OFA) network. The same code is found in code from OFA and code from AttentiveNAS. Both of them are migrated from the official implementation of PyTorch, but AttentiveNAS comments it out and does not implement such process, which leads to the unimplemented EMA.
However, this line of the DynamicBN function in AttentiveNAS also leaves a comment and some related links. But I couldn't find a specific reason for not using momentum in the linked paper and webpage. You can check out these links and we would be grateful to be able to get your feedback.
Best regards.
Hi thanks for your great work! I got a question regarding your implementation of
def reset_running_stats_for_calibration
. I find that you set bn.momentum=None, which means that you calibtrate BN layers by using exponential moving average. Why you do this in this way rather than set bn.momentum unchanged?I'm looking forward to hearing your feedback. Best regards,