Deepspeed zero3对Baichuan系的13b-chat进行微调，微调效果失效（Baichuan-13b-chat和Baichuan2-13b-chat都尝试过）

liuhao-0666 commented 1 year ago

非常感谢 hiyouga 大神的项目，现在遇到有一些问题，希望可以向您请教。

问题

我在使用Deepspeed zero3对Baichuan2-13b-chat进行微调后，微调效果失效（Baichuan-13b-chat也有同样的问题），但使用单卡的方式训练微调效果就是正常的，但是回答的不准确（有可能是数据集或参数的问题，在这里就不麻烦您做讨论了）。

参数&过程

Deepspeed zero3

贴出具体参数和过程，请您查看：

Deepspeed3.json

{
"bf16": {
    "enabled": false
},
"fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
},
"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},
"scheduler": {
    "type": "WarmupDecayLR",
    "params": {
        "last_batch_iteration": -1,
        "total_num_steps": "auto",
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},
"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 2e9,
    "stage3_max_reuse_distance": 2e9,
    "stage3_gather_16bit_weights_on_model_save": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}

Deepspeed训练参数

deepspeed -i localhost:6,7 src/train_bash.py \
--stage sft \
--model_name_or_path /media/data/llm/Baichuan2-13B-Chat \
--do_train \
--dataset global_managed_cloud_white_paper,loar_tuning_data_from_chatgpt,shang_hai_he_dan_datacenter_qa \
--template baichuan2 \
--finetuning_type lora \
--lora_target W_pack \
--output_dir /media/data/llm/output/2023-09-07-01-base-baichuan-13b-chat-checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--learning_rate 1e-4 \
--num_train_epochs 5.0 \
--plot_loss \
--fp16 \
--deepspeed deepspeed3.json

按照 https://github.com/hiyouga/LLaMA-Efficient-Tuning/issues/446 中的方式修改lora模型参数名称

import os
import torch
os.rename("adapter_model.bin", "adapter_model.bin.bak")
state_dict = torch.load("adapter_model.bin.bak", map_location="cpu")
new_state_dict = {k.replace("base_model.model.", ""): v for k, v in state_dict.items()}
torch.save(new_state_dict, "adapter_model.bin")

运行cli_demo.py

CUDA_VISIBLE_DEVICES=2,3 python src/cli_demo.py \
--model_name_or_path /media/data/llm/Baichuan2-13B-Chat \
--template baichuan2 \
--finetuning_type lora \
--checkpoint_dir /media/data/llm/output/2023-09-07-01-base-baichuan-13b-chat-checkpoint/checkpoint-200

测试结果其中“外高桥机房的耐火等级是什么？”是我在训练集中的训练数据，微调后的大模型无法回答，微调效果未生效

单卡训练

同样贴出具体参数和过程进行对比，请您查看：

单卡训练参数

CUDA_VISIBLE_DEVICES=4 python3 src/train_bash.py \
--stage sft \
--model_name_or_path /media/data/llm/Baichuan2-13B-Chat \
--do_train \
--dataset global_managed_cloud_white_paper,loar_tuning_data_from_chatgpt,shang_hai_he_dan_datacenter_qa \
--template baichuan2 \
--finetuning_type lora \
--lora_target W_pack \
--output_dir /media/data/llm/output/2023-09-07-02-base-baichuan-13b-chat-checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--learning_rate 1e-4 \
--num_train_epochs 5.0 \
--plot_loss \
--fp16 \
--quantization_bit 8

运行cli_demo.py

CUDA_VISIBLE_DEVICES=2,3 python src/cli_demo.py \
--model_name_or_path /media/data/llm/Baichuan2-13B-Chat \
--template baichuan2 \
--finetuning_type lora \
--checkpoint_dir /media/data/llm/output/2023-09-07-02-base-baichuan-13b-chat-checkpoint/checkpoint-200

测试结果暂不管结果是否正确，但单卡训练后的模型的确按照教他的方式进行了回答，微调效果生效

关于此条问答的训练集数据

hiyouga commented 1 year ago

两个 adapter_model.bin 文件的大小一样吗？

liuhao-0666 commented 1 year ago

两个 adapter_model.bin 文件的大小一样吗？

很奇怪，不一样，这点还忘记说了，两个adapter_model.bin大小有差

Deepspeed微调的adapter_model.bin大小为13M

单卡微调的adapter_model.bin大小为26M

lance691991 commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

liuhao-0666 commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

lance691991 commented 1 year ago

这是来自QQ邮箱的自动回复邮件。您好，您的邮件我已收到。我将尽快阅读，给您回复。

lance691991 commented 1 year ago

您好，我这边解决了，在dsconfig里面加上"stage3_gather_16bit_weights_on_model_save": true这个参数，训练完之后直接export_model就行了。

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: liuhao-0666 @.> 发送时间: 2023年9月11日 10:01 收件人: hiyouga/LLaMA-Efficient-Tuning @.> 抄送: lance691991 @.>, Comment @.> 主题: Re: [hiyouga/LLaMA-Efficient-Tuning] Deepspeed zero3对Baichuan系的13b-chat进行微调，微调效果失效（Baichuan-13b-chat和Baichuan2-13b-chat都尝试过） (Issue #837)

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

lance691991 commented 1 year ago

3. 按照 deepspeed(zero3)+lora微调，模型导出问题 #446 中的方式修改lora模型参数名称

也不用做这一步操作了

liuhao-0666 commented 1 year ago

您好，我这边解决了，在dsconfig里面加上"stage3_gather_16bit_weights_on_model_save": true这个参数，训练完之后直接export_model就行了。发自我的iPhone … ------------------ 原始邮件 ------------------ 发件人: liuhao-0666 @.> 发送时间: 2023年9月11日 10:01 收件人: hiyouga/LLaMA-Efficient-Tuning @.> 抄送: lance691991 @.>, Comment @.> 主题: Re: [hiyouga/LLaMA-Efficient-Tuning] Deepspeed zero3对Baichuan系的13b-chat进行微调，微调效果失效（Baichuan-13b-chat和Baichuan2-13b-chat都尝试过） (Issue #837) 请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。还没有，我要再实验一下使用一张卡做zero3看看 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

谢谢，我来试一试

liuhao-0666 commented 1 year ago

stage3_gather_16bit_weights_on_model_save

您可以贴一下您的dsconfig.js，提供一下参考吗，谢谢

liuhao-0666 commented 1 year ago

似乎是可以了，感觉是代码更新解决了问题

steamfeifei commented 1 year ago

似乎是可以了，感觉是代码更新解决了问题

更新了哪块代码呀

liuhao-0666 commented 1 year ago

似乎是可以了，感觉是代码更新解决了问题

更新了哪块代码呀

今天pull了一下最新的代码，但是adapter_model.bin还是和单卡微调的不太一样

steamfeifei commented 1 year ago

已收到谢谢!!

liuhao-0666 commented 1 year ago

已收到谢谢!!

https://github.com/hiyouga/LLaMA-Efficient-Tuning/issues/837#issuecomment-1713172688 但是按照这个所说了的确不需要做修改lora权重也不需要单独用zero_to_fp32.py合并模型了，直接使用export_model.py导出模型或者cli_demo.py选择checkpoint就可以跑了

steamfeifei commented 1 year ago

已收到谢谢!!

#837 (comment) 但是按照这个所说了的确不需要做修改lora权重也不需要单独用zero_to_fp32.py合并模型了，直接使用export_model.py导出模型或者cli_demo.py选择checkpoint就可以跑了

我看你最初的代码也是有这个配置的呀！而且我上周四pull的代码，也没更新，加上这个配置也不用使用zero_to_fp32.py了

steamfeifei commented 1 year ago

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

steamfeifei commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

这个您试过了没有，有没有效果

steamfeifei commented 1 year ago

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

@hiyouga

hiyouga commented 1 year ago

@steamfeifei 训练 Chat 模型必须选用正确的 template

steamfeifei commented 1 year ago

template 没问题呀 deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \ --stage sft \ --model_name_or_path baichuan-inc/Baichuan2-13B-Chat \ --do_train \ --dataset example1 \ --template baichuan2 \ --finetuning_type lora \ --lora_target W_pack \ --output_dir saves/Baichuan2-13B-Chat/lora/2023-09-11-18-21-37-epoch10 \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 100 \ --learning_rate 1e-4 \ --num_train_epochs 10.0 \ --plot_loss \ --fp16 \ --deepspeed ds_config.json

hiyouga commented 1 year ago

@steamfeifei 推理时候加了么？

steamfeifei commented 1 year ago

@steamfeifei 推理时候加了么？ emm, 尴尬，没加！我试试

steamfeifei commented 1 year ago

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

@hiyouga 如果stage设置为2，多卡和单卡训练都内存溢出。stage设置3，可以训练，但是训练出结果还是没效果。以下是stage为3时的配置： deepspeed_config.json

{
    "bf16": {
        "enabled": false
    },
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "last_batch_iteration": -1,
            "total_num_steps": "auto",
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto"
        }
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 2e9,
        "stage3_max_reuse_distance": 2e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

train.sh

deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan2-13B-Chat \
    --do_train \
    --dataset example1 \
    --template baichuan2 \
    --finetuning_type lora \
    --lora_target W_pack \
    --output_dir saves/Baichuan2-13B-Chat/lora/2023-09-11-18-21-37-epoch10 \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 1e-4 \
    --num_train_epochs 10.0 \
    --plot_loss \
    --fp16 \
    --deepspeed deepspeed_config.json

lance691991 commented 1 year ago

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

stage3设置为3，训练出的模型，感觉完全没效果。而且，有个问题，我问了一个问题，有时候会自动生成“Human"的内容，这个正常吗？

@hiyouga 如果stage设置为2，多卡和单卡训练都内存溢出。stage设置3，可以训练，但是训练出结果还是没效果。以下是stage为3时的配置： deepspeed_config.json

{
    "bf16": {
        "enabled": false
    },
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto"
        }
    },
    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "last_batch_iteration": -1,
            "total_num_steps": "auto",
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto"
        }
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 2e9,
        "stage3_max_reuse_distance": 2e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

train.sh

deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan2-13B-Chat \
    --do_train \
    --dataset example1 \
    --template baichuan2 \
    --finetuning_type lora \
    --lora_target W_pack \
    --output_dir saves/Baichuan2-13B-Chat/lora/2023-09-11-18-21-37-epoch10 \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 1e-4 \
    --num_train_epochs 10.0 \
    --plot_loss \
    --fp16 \
    --deepspeed deepspeed_config.json

你可以去deepspeed文档看看stage的定义，2的话好像只把derivative分布，所以会溢出。

这个是我的ds_config你可以参考下，还有你最后训练出来loss多少？ { "train_micro_batch_size_per_gpu": "auto", "train_batch_size": "auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 },
"zero_optimization": { "stage": 3, "allgather_partitions": true, "allgather_bucket_size": 5e8, "reduce_scatter": true, "reduce_bucket_size": 5e8, "overlap_comm": false, "contiguous_gradients": true, "stage3_gather_16bit_weights_on_model_save": true } }

steamfeifei commented 1 year ago

我的loss也挺小，你这个有效果吗

lance691991 commented 1 year ago

我的loss也挺小，你这个有效果吗

我这个有效果的

steamfeifei commented 1 year ago

我的loss也挺小，你这个有效果吗

我这个有效果的

好，我试试

steamfeifei commented 1 year ago

我的loss也挺小，你这个有效果吗

我这个有效果的

好，我试试

还是不行啊，我训练了10个epoch，loss为2.12。返回的结果一点没变

lance691991 commented 1 year ago

你把lr调大点，训练到loss很小，过拟合了再试试训练集的数据，看有没有效果

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: steamfeifei @.> 发送时间: 2023年9月12日 18:16 收件人: hiyouga/LLaMA-Efficient-Tuning @.> 抄送: lance691991 @.>, Comment @.> 主题: Re: [hiyouga/LLaMA-Efficient-Tuning] Deepspeed zero3对Baichuan系的13b-chat进行微调，微调效果失效（Baichuan-13b-chat和Baichuan2-13b-chat都尝试过） (Issue #837)

我的loss也挺小，你这个有效果吗

我这个有效果的

好，我试试

还是不行啊，我训练了10个epoch，loss为2.12

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

liuhao-0666 commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

这个您试过了没有，有没有效果

我pull代码前没有效果（无论deepspeed zero3使用单张卡还是多张卡都没有效果），pull代码之后直接export_model有效果了（我pull代码的时间是9月11日，本周一）。但是怎么说呢，微调效果是有了，但是回答数据不准确，我准备从pre-train到sft再实验一下

steamfeifei commented 1 year ago

嗯，您测试完效果如何，也发一下哈。我这面更新最新代码也试试。

------------------ 原始邮件 ------------------ 发件人: "hiyouga/LLaMA-Efficient-Tuning" @.>; 发送时间: 2023年9月13日(星期三) 上午10:16 @.>; @.**@.>; 主题: Re: [hiyouga/LLaMA-Efficient-Tuning] Deepspeed zero3对Baichuan系的13b-chat进行微调，微调效果失效（Baichuan-13b-chat和Baichuan2-13b-chat都尝试过） (Issue #837)

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

这个您试过了没有，有没有效果

我pull代码前没有效果（无论deepspeed zero3使用单张卡还是多张卡都没有效果），pull代码之后直接export_model有效果了（我pull代码的时间是9月11日，本周一）。但是怎么说呢，微调效果是有了，但是回答数据不准确，我准备从pre-train到sft再实验一下

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

steamfeifei commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

这个您试过了没有，有没有效果

我pull代码前没有效果（无论deepspeed zero3使用单张卡还是多张卡都没有效果），pull代码之后直接export_model有效果了（我pull代码的时间是9月11日，本周一）。但是怎么说呢，微调效果是有了，但是回答数据不准确，我准备从pre-train到sft再实验一下

您的参数方便发一下吗？更新代码后，我这也没什么效果

liuhao-0666 commented 1 year ago

请问这个问题解决了么，我这边也遇到相同问题了，也是多卡zero-3训练的。

还没有，我要再实验一下使用一张卡做zero3看看

这个您试过了没有，有没有效果

我pull代码前没有效果（无论deepspeed zero3使用单张卡还是多张卡都没有效果），pull代码之后直接export_model有效果了（我pull代码的时间是9月11日，本周一）。但是怎么说呢，微调效果是有了，但是回答数据不准确，我准备从pre-train到sft再实验一下

您的参数方便发一下吗？更新代码后，我这也没什么效果

deepspeed.conf

{
"bf16": {
    "enabled": false
},
"fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
},
"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},
"scheduler": {
    "type": "WarmupDecayLR",
    "params": {
        "last_batch_iteration": -1,
        "total_num_steps": "auto",
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},
"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 2e9,
    "stage3_max_reuse_distance": 2e9,
    "stage3_gather_16bit_weights_on_model_save": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}

Deepspeed训练参数

deepspeed -i localhost:6,7 src/train_bash.py \
--stage sft \
--model_name_or_path /media/data/llm/Baichuan2-13B-Chat \
--do_train \
--dataset global_managed_cloud_white_paper,loar_tuning_data_from_chatgpt,shang_hai_he_dan_datacenter_qa \
--template baichuan2 \
--finetuning_type lora \
--lora_target W_pack \
--output_dir /media/data/llm/output/2023-09-07-01-base-baichuan-13b-chat-checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--learning_rate 1e-4 \
--num_train_epochs 5.0 \
--plot_loss \
--fp16 \
--deepspeed deepspeed3.json

sunlei198911 commented 11 months ago

似乎是可以了，感觉是代码更新解决了问题

更新了哪块代码呀

今天pull了一下最新的代码，但是adapter_model.bin还是和单卡微调的不太一样

我也发现多卡和单卡的大小不一样，多卡的是单卡的一半，看你上面的贴图也是这样。你用多卡训练的效果感觉正常吗

lance691991 commented 11 months ago

这是来自QQ邮箱的自动回复邮件。您好，您的邮件我已收到。我将尽快阅读，给您回复。

hiyouga / LLaMA-Factory