lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
320 stars 44 forks source link

Allow parallel patch based training #31

Closed ryancinsight closed 2 years ago

ryancinsight commented 2 years ago

Currently ranger 21 variance normalized occasionally acquires nan's and faults if used in parallel data training, i.e.division by zero. This can be mitigated using eps and have not observed difference in results.

twmht commented 2 years ago

@ryancinsight

Why you closed this?

ryancinsight commented 2 years ago

@twmht It just seemed more like a hack and that there was probably an underlying issue. This issue is probably my main one with ranger21 regardless of parallel training and when it happens later on in training other issues start occurring.

lessw2020 commented 2 years ago

Hi @ryancinsight and @twmht, Just a quick note that myself and @nestordemeure and another person will be refreshing and improving things with a Ranger22 version.
As part of that, I'm hoping we can address the multi-gpu case properly as I'm actively involved now in some very large scale model training...so in a better position to understand and fix any issues than last year.