Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.89k
stars
344
forks
source link
zero3 The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported. #439
raise ZeRORuntimeException("The checkpoint being loaded used a DP " \
[rank5]: deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.
raise ZeRORuntimeException("The checkpoint being loaded used a DP " \ [rank5]: deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 8 but the current world size is 16. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.