Closed xrsrke closed 8 months ago
TODO
MixedFusedLayerNorm
@NouamaneTazi
Both done. I ran both fused layer norm and fast layer norm for 10k update steps. And here are the convergence results.
just reran it again after the new changes
TODO
MixedFusedLayerNorm
from megatron