huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k stars 122 forks source link

Script to fix duplicated ".safetensors" in checkpoints naming #151

Closed NouamaneTazi closed 6 months ago

NouamaneTazi commented 6 months ago

Fixes the problem where '{type.value}_{suffix_name}.safetensors' was duplicated in checkpoint files

For example this script will change the following:

checkpoints/10/model/model/decoder/0/pp_block/attn/o_proj/model_model_weight.safetensors_pp-rank-0-of-1_tp-rank-0-of-2.safetensors
to
checkpoints/10/model/model/decoder/0/pp_block/attn/o_proj/model_weight_pp-rank-0-of-1_tp-rank-0-of-2.safetensors

Example Usage:

python scripts/fix_checkpoint_bad_naming.py checkpoints/10

zzhhjjj commented 6 months ago

LGTM!