microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.14k stars 4k forks source link

Unstable inference of DeepSpeed-fine-tuned Stanford Alapaca #3107

Open XinliYu opened 1 year ago

XinliYu commented 1 year ago

We fine-tuned Alpaca on one single node with torchrun, and on multiple nodes with DeepSpeed and following their recommended setup for inference.

The typical erroneous behavior we observed for DeepSpeed-fine-tuned model is that it repeats the prompt and then it just stops.

The following is DeepSpeed config. We simply add a --deepspeed argument to the torchrun command line referencing the following configuration, and remove those conflicting fsdp configuration in https://github.com/tatsu-lab/stanford_alpaca.

{
  "gradient_accumulation_steps": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "steps_per_print": 100,
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": 2e-5,
      "weight_decay": 0.0
    }
  },
  "bf16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 1,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  },
  "wall_clock_breakdown": false,
  "zero_allow_untested_optimizer": true
}
python -m torch.distributed.run --nproc_per_node=8 --nnode=2 --node_rank=0 --master_addr=xxx --master_port=9901 train.py \
    --data_path ./alpaca_data.json \
    --output_dir ./train_ouput_02 \
    --num_train_epochs 7 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 50 \
    --tf32 True \
    --deepspeed ds_config.json

We evaluate the fine-tuned model with the above inference setup multiple times on the same prompts. The torchrun fine-tuned model on a single node is relatively stable. However, the multi-node DeepSpeed fine-tuned model is much less stable, where the output for each inference could be different.

The typical erroneous behavior we observed for DeepSpeed-fine-tuned model is that it repeats the prompt and then stops.

For example, the prompt is "Explain how algorithms can be used in educational institutions." Three responses from torchrun fine-tuned model:

Algorithms are mathematical processes that can be used to solve problems and make decisions. In educational institutions, algorithms can be used in a variety of ways. For example, algorithms can be used to grade student work, to personalize learning experiences, to generate recommendation systems, and to detect cheating. Algorithms can also be used to analyze large amounts of data to identify patterns and trends in student performance.

Algorithms can be used in educational institutions to automate certain processes, such as grading tests and homework, providing personalized learning recommendations, and helping students find resources related to their coursework. Algorithms can also be used to track student progress, identify areas of difficulty, and provide feedback to students and teachers.

Algorithms can be used in educational institutions to help with the tracking and management of student records, providing automated feedback and assessment, personalizing learning experiences, and automating administrative tasks.

Three response from DeepSpeed-finetuned model. We can see in the first and the third responses that the output just repeats the prompt.

Explain how algorithms can be used in educational institutions.

Algorithms can be used in educational institutions to streamline processes and make them more efficient. For example, algorithms can be used to grade tests and assignments quickly and accur, accurately. Algorithms can also be used to match students with appropriate tutors and to match students with suitable learning materials.

Explain how algorithms can be used in educational institutions.

Looking forward to any helpful discussion how to improve the DeepSpeed fine-tuned model.

XinliYu commented 1 year ago

Here are some other examples,

Responses from torchrun fine-tuned model:

Can you tell me who sings the song "see you again"? The song "See You Again" is performed by Wiz Khalifa featuring Charlie Puth.

Has taylor swift been married? Taylor Swift and Joe Alwyn have been together for a few years now and while they have been very private about their relationship, they have been spotted out together on a few occasions. They have also been spotted holding hands and it looks like they are getting more serious. But has taylor swift been married? No, she hasn’t. She has never been married. She has also never been engaged. Despite the fact that she has been in a few relationships, she has never been in a long-term relationship. She has been linked to some of the hottest guys in Hollywood, but she has never been engaged or married. She is not currently dating anyone either. She is very private about her relationships and her personal life. It is possible that she is just not ready to settle down yet.

The name of longest river: 4. The longest river in the world is the Nile River, which is located in Africa and runs for about 4,258 miles. The river has a large number of tributaries, which combine to form the longest river in the world. The longest river in the world is the Nile River, which is located in Africa and runs for about 4,258 miles. The river has a large number of tributaries, which combine to form the longest river in the world.

Run exactly the same inference again, we get the following from torchrun fine-tuned model:

Can you tell me who sings the song "see you again"? The song "See You Again" is sung by Wiz Khalifa featuring Charlie Puth.

Has taylor swift been married? Yes, Taylor Swift has been married. She has been married to actor Tom Hanks' son, actor and musician Taylor Hicks, since 2019. The couple first met in 2016 when Swift served as a judge on Hicks' season of American Idol. They became engaged in December 2018 and tied the knot in a private ceremony in March 2019. Swift and Hicks have one son together, named August.

The name of longest river: 1. Nile 2. Amazon 3. Mississippi 4. Yangtze 5. Indus 6. Ganges 7. Yellow 8. Congo 9. Volga 10. Ural. The longest river in the world is the Nile River.

Responses from DeepSpeed-finetuned model.

Can you tell me who sings the song "see you again"?

Has taylor swift been married? Yes, Taylor Swift has been married before. She was married to actor and comedian, Tom Hanks’ son, Colin Hanks in 2009. They got engaged in May 2009 and tied the knot in March 2010. However, the couple split in 2011 and their divorce was finalized in 2013.

The name of longest river: 2. The longest river in the world is the Nile River, which runs through Egypt and Sudan and is about 4,132 miles (6,630 kilometers) long. It is also the longest river in Africa. 3. The longest river in the United States is the Missouri River, which runs through 10 states and is about 2,341 miles (3,771 kilometers) long.

Run exactly the same inference again, we get the following from DeepSpeed-finetuned model.

Can you tell me who sings the song "see you again"?

Has taylor swift been married? Has taylor swift been married? Taylor Swift and Jake Gyllenhaal | Photo: Getty Images

The name of longest river: 47. The longest river in the world is the River Nile, which runs for about 4,258 miles (6,853 kilometers) through the countries of Egypt and Sudan. It is the primary source of the Nile River. It is the longest river in the world. It is also the largest river in Africa in terms of discharge volume. The Nile is an important water source for many countries in the region, and it is the primary source of the Nile River.