microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.02k stars 1.02k forks source link

DeepSpeed-Chat Step-1 training error #813

Open yifan-bao opened 10 months ago

yifan-bao commented 10 months ago

Hi, I cannot do step-1 SFT training after the refractoring. pip install deepspeed>=0.9.0

I did this in the folder applications/DeepSpeed-Chat

git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txt
pip install -e .

Then I enter into the applications/DeepSpeed-Chat/training/step1_supervised_finetuning folder. I run bash training_scripts/opt/single_gpu/run_1.3b.sh. However this gives me the error:

Traceback (most recent call last):
  File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 394, in <module>
    main()
  File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 264, in main
    train_dataset, eval_dataset = create_prompt_dataset(
  File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py", line 350, in create_prompt_dataset
    return torch.load(train_fname), torch.load(eval_fname)
  File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
  File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1415, in find_class
    return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'utils'

From the error message, I cannot identify what utils module is missing, and from where I should check. There are so many utils in this project. Another thing after the refractoring is that the train.py file on applications/DeepSpeed-Chat/ folder is missing but the readme still use it, which is very misleading.

Hope someone can help me identify the problem. Thanks.

george-kuanli-peng commented 10 months ago

I clone the code today and do not have the problem of missing utils.

Another thing after the refractoring is that the train.py file on applications/DeepSpeed-Chat/ folder is missing but the readme still use it, which is very misleading.

This also confuses me for a while. Maybe you could open another issue for this.