Closed ryancinsight closed 2 years ago
@ryancinsight
Why you closed this?
@twmht It just seemed more like a hack and that there was probably an underlying issue. This issue is probably my main one with ranger21 regardless of parallel training and when it happens later on in training other issues start occurring.
Hi @ryancinsight and @twmht,
Just a quick note that myself and @nestordemeure and another person will be refreshing and improving things with a Ranger22 version.
As part of that, I'm hoping we can address the multi-gpu case properly as I'm actively involved now in some very large scale model training...so in a better position to understand and fix any issues than last year.
Currently ranger 21 variance normalized occasionally acquires nan's and faults if used in parallel data training, i.e.division by zero. This can be mitigated using eps and have not observed difference in results.