Closed inkcherry closed 4 months ago
@inkcherry, UCP supports PP conversion to/from other parallelism topologies (ZeRO-DP, SP, TP etc), however, training with SP/PP combo with and without UCP has not been tested.
Thank you for your explanation~ @samadejacobs , yes I believe using SP without pp would be more stable, so I tried the following:
--no-pipeline-parallel
, from some past workloads).--no-pipeline-parallel
, with ds-sp ).But I encountered a crash at step2 , weight names have changed, UCP does not work in this case, as mentioned in the documentation change. : )
Is my understanding correct? due to compatibility with PP, for example, when using ds-SP, it needs to be disabled, which means some weights that previously relied on PP cannot be directly used.