单机多卡运行时报错 has parameters that were not used in producing loss

whi497 commented 1 year ago

🐛 bug 说明

使用指令 CUDA_VISIBLE_DEVICES=2,3 accelerate launch --num_processes 2 path_to_train_m3e.py path_to_model path_to_dataset \ --output-dir output_dir 报错信息 RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). Parameter indices which did not receive grad for rank 0: 197 198 In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error 完整日志 log.txt 请问这个问题怎么解决呢？谢谢了

Python Version

3.11

wangyuxinwhy commented 1 year ago

需要设置 Accelerator 。报错信息其实很明确，在原始的训练代码中进行如下修改

#  import 这个
from accelerate import DistributedDataParallelKwargs

accelerator = Accelerator(
    mixed_precision=mixed_precision.value,
    gradient_accumulation_steps=gradient_accumulation_steps,
    project_config=project_config,
    log_with=['tensorboard'] if use_tensorboard else None,
    dispatch_batches=True,
    split_batches=True,
    # 添加下面这一行
    kwargs_handlers=[DistributedDataParallelKwargs(find_unused_parameters=True)],
)

添加后再运行就可以了

whi497 commented 1 year ago

好的，今天早上看到那个examples里的finetune_jsonl.py能跑，仿照那个设了一下参数就行了，感谢作者

whi497 commented 1 year ago

不过这个参数看起来只是检查unused_parameters，为什么加上后能解决问题呢

wangyuxinwhy commented 1 year ago

我是按照文档的方式使用的，没有深究过。我现在主要使用 FSDP ，就没太关注 DDP 了。

susht3 commented 5 months ago

好的，今天早上看到那个examples里的finetune_jsonl.py能跑，仿照那个设了一下参数就行了，感谢作者

finetune_jsonl.py，请问这个代码可以跑gpu吗？我的gpu显卡好像没有使用，跑的还是cpu

liangtianxin commented 2 months ago

`import os import sys import torch import pandas as pd from uniem.finetuner import FineTuner from accelerate import DistributedDataParallelKwargs #增加这个多卡训练块

if name == 'main': path_train_data = sys.argv[1] path_pretrain_model = sys.argv[2] path_output = sys.argv[3]

df = pd.read_json(path_train_data, lines=True)
finetuner = FineTuner.from_pretrained(path_pretrain_model, dataset=df.to_dict('records'))
fintuned_model = finetuner.run(epochs=10, batch_size=128, lr=3e-5, max_length=64, output_dir=path_output,accelerator_kwargs={'kwargs_handlers': [DistributedDataParallelKwargs(find_unused_parameters=True)]},)#增加这个多卡训练块
torch.save(fintuned_model.state_dict(), os.path.join(path_output, "model","pytorch_model.bin"))

`

wangyuxinwhy / uniem

单机多卡运行时报错 has parameters that were not used in producing loss #89

🐛 bug 说明

Python Version