Closed nopperl closed 7 months ago
I tried implementing the topology-agnostic optimizer state loading for the pipeline parallel dimension ( #38 ). This also fixes the issue I had in #68 .
I tried implementing the topology-agnostic optimizer state loading for the pipeline parallel dimension ( #38 ). This also fixes the issue I had in #68 .