jsd loss - Githubissues

tfriedel commented 5 months ago

Hi, many thanks for making your code open source! I'm currently experimenting with it and have integrated it into the timm training pipeline.

I saw you have a flag for the jsd loss. Did you use this for the results in the paper? I assume no, because it wasn't mentioned and it would make training quite a bit slower.

But since the option is there, did you try it? Did it help?

hzlsaber commented 5 months ago

Hi, Thank you for your interest in our work!

Regarding your question about the JSD loss, yes, we did use it for the results reported in our paper. Although it was not explicitly highlighted, the use of JSD loss was instrumental in achieving the enhanced performance metrics we presented. We found that JSD loss significantly improves accuracy and other safety metrics, despite indeed prolonging the training time. We acknowledge this oversight and plan to update the paper to include a mention of JSD loss at an appropriate place.

We included the JSD loss flag in our codebase to offer users the option to explore its benefits, acknowledging that the extended training time might be a worthy trade-off for the gains in model performance and robustness.

Best wishes, Zhenglin

tfriedel commented 5 months ago

Thanks Zhenglin! I can confirm I also get better results when using JSD loss. I am a bit surprised that PixMix, which was follow-up work on AugMix doesn't use it.

hzlsaber commented 5 months ago

Hi,

Thank you for your reply!

I agree with your points. In fact, it's mentioned in PixMix that " the Jensen-Shannon Divergence consistency loss, which requires at least thrice the memory per batch." I think this may be the reason why authors don't use it.

Besides, in my view, perhaps the reason is that the method of generating images in PixMix is simpler, as it involves merging the same image with a fractal image or an enhanced version of the original image within a single pipeline. This method does not require the use of JSD loss to maintain semantic consistency in the generated images. I also suspect that using JSD loss might not yield better results for PixMix, whereas the methods used in AugMix and IPmix are more complex. IPmix, in particular, involves merging images across multiple levels and uses more complex mixing methods (e.g., random mixing), hence the need for JSD loss to maintain consistency in predictions.

If you are interested in this topic, you might consider trying out JSD loss with PixMix to observe its effects.

If you have more questions, feel free to contact me.

Best wishes, Zhenglin

hzlsaber / IPMix

jsd loss #8