shuxueslpi / chatGLM-6B-QLoRA

使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
356 stars 46 forks source link

请问模型怎么才能通过deepspeed进行多卡训练 #22

Open RayneSun opened 1 year ago

RayneSun commented 1 year ago

如题

shuxueslpi commented 1 year ago

暂时还有点问题,我也在调试,会尽快更新

yyqi17 commented 1 year ago

以下是修改后跑通deepspeed单机多卡的主要替换代码(替换 trainer=LoRATrainer 及之后的部分):

model_engine, optimizer, train_dataloader, _ = deepspeed.initialize(config=conf,
                                                                    model=model,
                                                                    model_parameters=model.parameters(),
                                                                    training_data=train_dataset,
                                                                    collate_fn=coll_fn)
model_engine.train()
for i_epoch in range(global_args.num_train_epochs):
    for micro_step, batch in enumerate(train_dataloader):
        input_ids = batch["input_ids"].to(model_engine.local_rank)
        labels = batch["labels"].to(model_engine.local_rank)

        outputs = model_engine.forward(input_ids=input_ids, labels=labels)
        loss = outputs[0]

        model_engine.backward(loss)
        model_engine.step()

    save_dir = f'{global_args.output_dir}/{i_epoch}'
    model_engine.save_pretrained(save_dir)

补充:

  1. 这里coll_fn用原始的DataCollatorForChatGLM会有问题,coll_fn是一个单独的函数(类似DataCollatorForChatGLM.call
  2. model加载时复用了官方脚本里的加载方式

最后train.sh里的python改成deepspeed启动就可以了

yyqi17 commented 1 year ago

以下是修改后跑通deepspeed单机多卡的主要替换代码(替换 trainer=LoRATrainer 及之后的部分):

model_engine, optimizer, train_dataloader, _ = deepspeed.initialize(config=conf,
                                                                    model=model,
                                                                    model_parameters=model.parameters(),
                                                                    training_data=train_dataset,
                                                                    collate_fn=coll_fn)
model_engine.train()
for i_epoch in range(global_args.num_train_epochs):
    for micro_step, batch in enumerate(train_dataloader):
        input_ids = batch["input_ids"].to(model_engine.local_rank)
        labels = batch["labels"].to(model_engine.local_rank)

        outputs = model_engine.forward(input_ids=input_ids, labels=labels)
        loss = outputs[0]

        model_engine.backward(loss)
        model_engine.step()

    save_dir = f'{global_args.output_dir}/{i_epoch}'
    model_engine.save_pretrained(save_dir)

补充:

  1. 这里coll_fn用原始的DataCollatorForChatGLM会有问题,coll_fn是一个单独的函数(类似DataCollatorForChatGLM.call
  2. model加载时复用了官方脚本里的加载方式

最后train.sh里的python改成deepspeed启动就可以了

这个conf是lora_config吗

不是,conf是deepspeed的配置,比如像下面这样

conf = {"train_micro_batch_size_per_gpu": args.per_device_train_batch_size,
      "gradient_accumulation_steps": args.gradient_accumulation_steps,
      "gradient_clipping": 1.0,
      "optimizer": {
          "type": "Adam",
          "params": {
              "lr": args.learning_rate,
              "betas": [
                  0.9,
                  0.95
              ],
              "eps": 1e-8,
              "weight_decay": args.weight_decay
          }
      },
      "fp16": {
          "enabled": False
      },
      "zero_optimization": {
          "stage": args.zero_stage,
          "offload_optimizer": {
              "device": "cpu",
              "pin_memory": True
          },
          "allgather_partitions": True,
          "allgather_bucket_size": 2e8,
          "overlap_comm": True,
          "reduce_scatter": True,
          "reduce_bucket_size": 2e8,
          "contiguous_gradients": True
      },
  }
WellWang-S commented 1 year ago

以下是修改后跑通deepspeed单机多卡的主要替换代码(替换 trainer=LoRATrainer 及之后的部分):

model_engine, optimizer, train_dataloader, _ = deepspeed.initialize(config=conf,
                                                                    model=model,
                                                                    model_parameters=model.parameters(),
                                                                    training_data=train_dataset,
                                                                    collate_fn=coll_fn)
model_engine.train()
for i_epoch in range(global_args.num_train_epochs):
    for micro_step, batch in enumerate(train_dataloader):
        input_ids = batch["input_ids"].to(model_engine.local_rank)
        labels = batch["labels"].to(model_engine.local_rank)

        outputs = model_engine.forward(input_ids=input_ids, labels=labels)
        loss = outputs[0]

        model_engine.backward(loss)
        model_engine.step()

    save_dir = f'{global_args.output_dir}/{i_epoch}'
    model_engine.save_pretrained(save_dir)

补充:

  1. 这里coll_fn用原始的DataCollatorForChatGLM会有问题,coll_fn是一个单独的函数(类似DataCollatorForChatGLM.call
  2. model加载时复用了官方脚本里的加载方式

最后train.sh里的python改成deepspeed启动就可以了

多卡训练会报错,untimeError: Expected all tensors to be on the same device, but found at least two devices,你有遇到吗

yyqi17 commented 1 year ago

以下是修改后跑通deepspeed单机多卡的主要替换代码(替换 trainer=LoRATrainer 及之后的部分):

model_engine, optimizer, train_dataloader, _ = deepspeed.initialize(config=conf,
                                                                    model=model,
                                                                    model_parameters=model.parameters(),
                                                                    training_data=train_dataset,
                                                                    collate_fn=coll_fn)
model_engine.train()
for i_epoch in range(global_args.num_train_epochs):
    for micro_step, batch in enumerate(train_dataloader):
        input_ids = batch["input_ids"].to(model_engine.local_rank)
        labels = batch["labels"].to(model_engine.local_rank)

        outputs = model_engine.forward(input_ids=input_ids, labels=labels)
        loss = outputs[0]

        model_engine.backward(loss)
        model_engine.step()

    save_dir = f'{global_args.output_dir}/{i_epoch}'
    model_engine.save_pretrained(save_dir)

补充:

  1. 这里coll_fn用原始的DataCollatorForChatGLM会有问题,coll_fn是一个单独的函数(类似DataCollatorForChatGLM.call
  2. model加载时复用了官方脚本里的加载方式

最后train.sh里的python改成deepspeed启动就可以了

多卡训练会报错,untimeError: Expected all tensors to be on the same device, but found at least two devices,你有遇到吗

我遇到的时候这个报错是来自于model加载部分,也就是在这块代码之前model=xxxModel()那里,或许可以看一下model_device_map是不是正确的