一机多卡执行训练报错，torchrun 的 --nproc_per_node 配置`2`时正常，配置为大于`2`的数值后报错

提交前必须检查以下项目

[X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。
[X] 由于相关依赖频繁更新，请确保按照Wiki中的相关步骤执行
[X] 我已阅读FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案
[X] 第三方插件问题：例如llama.cpp、text-generation-webui、LlamaChat等，同时建议到对应的项目中查找解决方案
[X] 模型正确性检查：务必检查模型的SHA256.md，模型不对的情况下无法保证效果和正常运行

问题类型

模型训练与精调

基础模型

LLaMA-Plus-7B

操作系统

Linux

详细描述问题

1机10卡执行训练报错，torchrun 的 --nproc_per_node 配置2时正常，配置为大于2的数值后报错。请问哪位朋友处理过类似问题。依赖库的版本目前看也都是正常的，但是执行大于2卡训练的命令时会报错。

# 执行训练命令

root@d91c734b9499:/opt/app# bash /data/agi/Chinese-LLaMA-Alpaca/scripts/training/run_pt.sh

#  run_pt.sh 内容

lr=2e-4
lora_rank=8
lora_alpha=32
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/data/agi/LoRA/hf/7B_Llama_Plus
chinese_tokenizer_path=/data/agi/LoRA/hf/7B_Llama_Plus
dataset_dir=/data/agi/nh/text/txt
data_cache=/data/agi/nh/cache/txt
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
output_dir=/data/agi/nh/model/txt

deepspeed_config_file=/data/agi/Chinese-LLaMA-Alpaca/scripts/training/ds_zero2_no_offload.json

export WANDB_DISABLED=true
export OMP_NUM_THREADS=1
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9

rm -Rf /data/agi/nh/cache/txt/*
rm -Rf /data/agi/nh/model/txt/*

# --nproc_per_node 配置为`2`时正常执行，配置成大于`2`的数值时报错，`7`、`8`和`10`都不可用。
torchrun --nnodes 1 --nproc_per_node 8 /data/agi/Chinese-LLaMA-Alpaca/scripts/training/run_clm_pt_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir ${data_cache} \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --block_size 512 \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --modules_to_save ${modules_to_save} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False

内存信息：

root@d91c734b9499:/opt/app# free -mh
               total        used        free      shared  buff/cache   available
Mem:           1.5Ti       3.7Gi       1.5Ti       135Mi       5.0Gi       1.5Ti
Swap:          4.0Gi          0B       4.0Gi

依赖情况（代码类问题务必提供）

# 依赖全部按照Wiki安装

root@d91c734b9499:/opt/app# pip list
Package            Version      Editable project location
------------------ ------------ --------------------------
accelerate         0.20.3
aiofiles           23.1.0
aiohttp            3.8.4
aiosignal          1.3.1
altair             5.0.1
anyio              3.7.1
appdirs            1.4.4
async-timeout      4.0.2
attrs              23.1.0
certifi            2022.12.7
charset-normalizer 2.1.1
click              8.1.3
cmake              3.25.0
contourpy          1.1.0
cycler             0.11.0
datasets           2.13.0
deepspeed          0.9.4
dill               0.3.6
docker-pycreds     0.4.0
exceptiongroup     1.1.1
fastapi            0.99.1
ffmpy              0.3.0
filelock           3.9.0
fonttools          4.40.0
frozenlist         1.3.3
fsspec             2023.6.0
gitdb              4.0.10
GitPython          3.1.31
gradio             3.35.2
gradio_client      0.2.7
h11                0.14.0
hjson              3.1.0
httpcore           0.17.3
httpx              0.24.1
huggingface-hub    0.15.1
idna               3.4
iniconfig          2.0.0
Jinja2             3.1.2
joblib             1.2.0
jsonschema         4.17.3
kiwisolver         1.4.4
latex2mathml       3.76.0
linkify-it-py      2.0.2
lit                15.0.7
Markdown           3.4.3
markdown-it-py     2.2.0
MarkupSafe         2.1.2
matplotlib         3.7.2
mdit-py-plugins    0.3.3
mdtex2html         1.2.0
mdurl              0.1.2
mpmath             1.2.1
multidict          6.0.4
multiprocess       0.70.14
networkx           3.0
ninja              1.11.1
numpy              1.24.1
orjson             3.9.1
packaging          23.1
pandas             2.0.2
pathtools          0.1.2
peft               0.3.0.dev0   /opt/app/envs/peft_13e53fc
Pillow             9.3.0
pip                23.1.2
pluggy             1.0.0
protobuf           4.23.3
psutil             5.9.5
py-cpuinfo         9.0.0
pyarrow            12.0.1
pydantic           1.10.9
pydub              0.25.1
Pygments           2.15.1
pyparsing          3.0.9
pyrsistent         0.19.3
pytest             7.3.2
python-dateutil    2.8.2
python-multipart   0.0.6
pytz               2023.3
PyYAML             6.0
regex              2023.6.3
requests           2.28.1
safetensors        0.3.1
scikit-learn       1.2.2
scipy              1.10.1
semantic-version   2.10.0
sentencepiece      0.1.99
sentry-sdk         1.25.1
setproctitle       1.3.2
setuptools         59.6.0
six                1.16.0
smmap              5.0.0
sniffio            1.3.0
starlette          0.27.0
sympy              1.11.1
threadpoolctl      3.1.0
tokenizers         0.13.3
tomli              2.0.1
toolz              0.12.0
torch              2.0.0+cu118
torchaudio         2.0.1+cu118
torchvision        0.15.1+cu118
tqdm               4.65.0
transformers       4.30.2
triton             2.0.0
typing_extensions  4.7.1
tzdata             2023.3
uc-micro-py        1.0.2
urllib3            1.26.13
uvicorn            0.22.0
wandb              0.15.4
websockets         11.0.3
xxhash             3.2.0
yarl               1.9.2

# nvidia-smi 信息：

[shibingli@loaclhost ~]$ sudo docker run --gpus=all --runtime=nvidia --rm -it -v /data/:/data/ -v /data/agi/Chinese-LLaMA-Alpaca-Docker/envs/:/opt/app/envs/ rl-agi:latest bash
[sudo] shibingli 的密码：

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

root@d91c734b9499:/opt/app# nvidia-smi
Fri Jul  7 09:23:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A800 80GB PCIe          Off | 00000000:12:00.0 Off |                    0 |
| N/A   34C    P0              42W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A800 80GB PCIe          Off | 00000000:13:00.0 Off |                    0 |
| N/A   33C    P0              44W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A800 80GB PCIe          Off | 00000000:14:00.0 Off |                    0 |
| N/A   35C    P0              47W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A800 80GB PCIe          Off | 00000000:48:00.0 Off |                    0 |
| N/A   34C    P0              42W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A800 80GB PCIe          Off | 00000000:49:00.0 Off |                    0 |
| N/A   34C    P0              43W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A800 80GB PCIe          Off | 00000000:89:00.0 Off |                    0 |
| N/A   34C    P0              44W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A800 80GB PCIe          Off | 00000000:8A:00.0 Off |                    0 |
| N/A   34C    P0              42W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A800 80GB PCIe          Off | 00000000:C0:00.0 Off |                    0 |
| N/A   34C    P0              45W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   8  NVIDIA A800 80GB PCIe          Off | 00000000:C1:00.0 Off |                    0 |
| N/A   33C    P0              44W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   9  NVIDIA A800 80GB PCIe          Off | 00000000:C2:00.0 Off |                    0 |
| N/A   34C    P0              44W / 300W |     18MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

运行日志或截图

# 运行日志

root@d91c734b9499:/opt/app# bash /data/agi/Chinese-LLaMA-Alpaca/scripts/training/run_pt.sh
[2023-07-07 09:29:01,360] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,364] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,365] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,365] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,398] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,414] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:01,417] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-07-07 09:29:03,691] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,691] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,785] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,785] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,853] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,854] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,862] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,862] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,864] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-07 09:29:03,867] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,867] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-07 09:29:03,871] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-07 09:29:03,871] [INFO] [comm.py:594:init_distributed] cdb=None
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
[INFO|configuration_utils.py:667] 2023-07-07 09:29:04,229 >> loading configuration file /data/agi/LoRA/hf/7B_Llama_Plus/config.json
[INFO|configuration_utils.py:725] 2023-07-07 09:29:04,229 >> Model config LlamaConfig {
  "_name_or_path": "/data/agi/LoRA/hf/7B_Llama_Plus",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.30.2",
  "use_cache": true,
  "vocab_size": 49953
}

[INFO|tokenization_utils_base.py:1821] 2023-07-07 09:29:04,230 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:1821] 2023-07-07 09:29:04,230 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1821] 2023-07-07 09:29:04,230 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1821] 2023-07-07 09:29:04,230 >> loading file tokenizer_config.json
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:04 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:05 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True
07/07/2023 09:29:05 - INFO - datasets.builder - Using custom data configuration default-4e021b6fe6b72b11
07/07/2023 09:29:05 - INFO - datasets.info - Loading Dataset Infos from /opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/datasets/packaged_modules/text
07/07/2023 09:29:05 - INFO - datasets.builder - Generating dataset text (/data/agi/nh/cache/txt/knowledge_text/text/default-4e021b6fe6b72b11/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)
Downloading and preparing dataset text/default to /data/agi/nh/cache/txt/knowledge_text/text/default-4e021b6fe6b72b11/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2...
Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 11275.01it/s]
07/07/2023 09:29:05 - INFO - datasets.download.download_manager - Downloading took 0.0 min
07/07/2023 09:29:05 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1593.58it/s]
07/07/2023 09:29:05 - INFO - datasets.builder - Generating train split
07/07/2023 09:29:05 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.
Dataset text downloaded and prepared to /data/agi/nh/cache/txt/knowledge_text/text/default-4e021b6fe6b72b11/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2. Subsequent calls will reuse this data.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 848.53it/s]
07/07/2023 09:29:05 - INFO - __main__ - knowledge.txt has been loaded
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #0 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00000_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #1 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00001_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #2 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00002_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #3 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00003_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #4 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00004_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #5 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00005_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #6 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00006_of_00008.arrow
07/07/2023 09:29:05 - INFO - datasets.arrow_dataset - Process #7 will write at /data/agi/nh/cache/txt/knowledge_text/tokenized_00007_of_00008.arrow
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1337 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 1336) of binary: /opt/app/envs/venv_peft_13e53fc/bin/python
Traceback (most recent call last):
  File "/opt/app/envs/venv_peft_13e53fc/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/app/envs/venv_peft_13e53fc/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/data/agi/Chinese-LLaMA-Alpaca/scripts/training/run_clm_pt_with_peft.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 2 (local_rank: 2)
  exitcode  : -7 (pid: 1338)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1338
[2]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 3 (local_rank: 3)
  exitcode  : -7 (pid: 1339)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1339
[3]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 4 (local_rank: 4)
  exitcode  : -7 (pid: 1340)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1340
[4]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 5 (local_rank: 5)
  exitcode  : -7 (pid: 1341)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1341
[5]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 6 (local_rank: 6)
  exitcode  : -7 (pid: 1342)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1342
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-07_09:29:19
  host      : d91c734b9499
  rank      : 0 (local_rank: 0)
  exitcode  : -7 (pid: 1336)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 1336
============================================================

ymcui / Chinese-LLaMA-Alpaca

一机多卡执行训练报错，torchrun 的 --nproc_per_node 配置`2`时正常，配置为大于`2`的数值后报错 #722

提交前必须检查以下项目

问题类型

基础模型

操作系统

详细描述问题

依赖情况（代码类问题务必提供）

运行日志或截图