ssbuild / chatglm3_finetuning

Apache License 2.0
37 stars 2 forks source link

zero2fp32 error #27

Closed LeonG7 closed 11 months ago

LeonG7 commented 11 months ago

transformers 4.35.0 torch 2.1.0 dataclasses 0.6 deep-training 0.2.7.post2 deepspeed 0.11.1

AttributeError: 'ModelArguments' object has no attribute 'model_custom'

**python zero_to_fp32.py . pytorch_model.bin**

[2023-11-06 10:30:38,651] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint './checkpoint'
Detected checkpoint of type zero stage 2, world_size: 8
Traceback (most recent call last):
  File "/mnt/volumes/dip-ulan-cpfs/NLP/Spotter_gpt/train_voc/chatglm3-finetune-all/20231103_prompt4_gptcot_4542_20epoch/last/zero_to_fp32.py", line 587, in <module>
    convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir, args.output_file, tag=args.tag)
  File "/mnt/volumes/dip-ulan-cpfs/NLP/Spotter_gpt/train_voc/chatglm3-finetune-all/20231103_prompt4_gptcot_4542_20epoch/last/zero_to_fp32.py", line 523, in convert_zero_checkpoint_to_fp32_state_dict
    state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
  File "/mnt/volumes/dip-ulan-cpfs/NLP/Spotter_gpt/train_voc/chatglm3-finetune-all/20231103_prompt4_gptcot_4542_20epoch/last/zero_to_fp32.py", line 509, in get_fp32_state_dict_from_zero_checkpoint
    return _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir)
  File "/mnt/volumes/dip-ulan-cpfs/NLP/Spotter_gpt/train_voc/chatglm3-finetune-all/20231103_prompt4_gptcot_4542_20epoch/last/zero_to_fp32.py", line 210, in _get_fp32_state_dict_from_zero_checkpoint
    zero_model_states = parse_model_states(model_files)
  File "/mnt/volumes/dip-ulan-cpfs/NLP/Spotter_gpt/train_voc/chatglm3-finetune-all/20231103_prompt4_gptcot_4542_20epoch/last/zero_to_fp32.py", line 98, in parse_model_states
    state_dict = torch.load(file, map_location=device)
  File "/home/jovyan/miniconda3/envs/nlp3.9/lib/python3.9/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "/home/jovyan/miniconda3/envs/nlp3.9/lib/python3.9/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
  File "/home/jovyan/miniconda3/envs/nlp3.9/lib/python3.9/site-packages/deep_training/data_helper/base_args.py", line 11, in __dict__
    return asdict(self)
  File "/home/jovyan/miniconda3/envs/nlp3.9/lib/python3.9/dataclasses.py", line 1075, in asdict
    return _asdict_inner(obj, dict_factory)
  File "/home/jovyan/miniconda3/envs/nlp3.9/lib/python3.9/dataclasses.py", line 1082, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
**AttributeError: 'ModelArguments' object has no attribute 'model_custom'**
LeonG7 commented 11 months ago

yes,I want to convert the zero file to pt file.

ssbuild commented 11 months ago

pip install -U deep_training 0.2.7.post4 solved , fix dataclass serialization.