Hello, I was trying to run instruction tuning of CodeT5+ and encountered this issue. The error message is
(cjy_ct5) nlpir@nlpir-SYS-4028GR-TR:~/cjy/CodeT5/CodeT5+$ sh instruct_finetune.sh
Using CUDA version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
[2024-05-28 20:40:23,112] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:25,339] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-05-28 20:40:25,339] [INFO] [runner.py:568:main] cmd = /home/nlpir/miniconda3/envs/cjy_ct5/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None instruct_tune_codet5p.py --load baselines/codet5p-220m --save-dir saved_models/instructcodet5p-220m --instruct-data-path datasets/code_alpaca_20k.json --fp16 --deepspeed deepspeed_config.json
[2024-05-28 20:40:26,653] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[2024-05-28 20:40:28,850] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [6, 7]}
[2024-05-28 20:40:28,850] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-05-28 20:40:28,850] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-05-28 20:40:28,850] [INFO] [launch.py:164:main] dist_world_size=2
[2024-05-28 20:40:28,850] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=6,7
[2024-05-28 20:40:28,860] [INFO] [launch.py:256:main] process 2307758 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=0', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
[2024-05-28 20:40:28,867] [INFO] [launch.py:256:main] process 2307759 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json']
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 1,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
{'batch_size_per_replica': 1,
'cache_data': 'cache_data/instructions',
'data_num': -1,
'deepspeed': 'deepspeed_config.json',
'epochs': 3,
'fp16': True,
'grad_acc_steps': 16,
'instruct_data_path': 'datasets/code_alpaca_20k.json',
'load': 'baselines/codet5p-220m',
'local_rank': 0,
'log_freq': 10,
'lr': 2e-05,
'lr_warmup_steps': 30,
'max_len': 512,
'save_dir': 'saved_models/instructcodet5p-220m',
'save_freq': 500}
==> Loaded 20022 samples
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
[2024-05-28 20:40:32,872] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307758
[2024-05-28 20:40:32,873] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307759
[2024-05-28 20:40:32,905] [ERROR] [launch.py:325:sigkill_handler] ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] exits with return code = 1
Hello, I was trying to run instruction tuning of CodeT5+ and encountered this issue. The error message is
(cjy_ct5) nlpir@nlpir-SYS-4028GR-TR:~/cjy/CodeT5/CodeT5+$ sh instruct_finetune.sh Using CUDA version: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 [2024-05-28 20:40:23,112] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] please install triton==1.0.0 if you want to use sparse attention [2024-05-28 20:40:25,339] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2024-05-28 20:40:25,339] [INFO] [runner.py:568:main] cmd = /home/nlpir/miniconda3/envs/cjy_ct5/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None instruct_tune_codet5p.py --load baselines/codet5p-220m --save-dir saved_models/instructcodet5p-220m --instruct-data-path datasets/code_alpaca_20k.json --fp16 --deepspeed deepspeed_config.json [2024-05-28 20:40:26,653] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] please install triton==1.0.0 if you want to use sparse attention [2024-05-28 20:40:28,850] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [6, 7]} [2024-05-28 20:40:28,850] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0 [2024-05-28 20:40:28,850] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]}) [2024-05-28 20:40:28,850] [INFO] [launch.py:164:main] dist_world_size=2 [2024-05-28 20:40:28,850] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=6,7 [2024-05-28 20:40:28,860] [INFO] [launch.py:256:main] process 2307758 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=0', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] [2024-05-28 20:40:28,867] [INFO] [launch.py:256:main] process 2307759 spawned with command: ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] {'batch_size_per_replica': 1, 'cache_data': 'cache_data/instructions', 'data_num': -1, 'deepspeed': 'deepspeed_config.json', 'epochs': 3, 'fp16': True, 'grad_acc_steps': 16, 'instruct_data_path': 'datasets/code_alpaca_20k.json', 'load': 'baselines/codet5p-220m', 'local_rank': 1, 'log_freq': 10, 'lr': 2e-05, 'lr_warmup_steps': 30, 'max_len': 512, 'save_dir': 'saved_models/instructcodet5p-220m', 'save_freq': 500} ==> Loaded 20022 samples {'batch_size_per_replica': 1, 'cache_data': 'cache_data/instructions', 'data_num': -1, 'deepspeed': 'deepspeed_config.json', 'epochs': 3, 'fp16': True, 'grad_acc_steps': 16, 'instruct_data_path': 'datasets/code_alpaca_20k.json', 'load': 'baselines/codet5p-220m', 'local_rank': 0, 'log_freq': 10, 'lr': 2e-05, 'lr_warmup_steps': 30, 'max_len': 512, 'save_dir': 'saved_models/instructcodet5p-220m', 'save_freq': 500} ==> Loaded 20022 samples ==> Loaded model from baselines/codet5p-220m, model size 222882048 Para before freezing: 222882048, trainable para: 223M Traceback (most recent call last): File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
==> Loaded model from baselines/codet5p-220m, model size 222882048
Para before freezing: 222882048, trainable para: 223M
Traceback (most recent call last):
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 210, in
main(args)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 177, in main
freeze_decoder_except_xattn_codegen(model)
File "/home/nlpir/cjy/CodeT5/CodeT5+/instruct_tune_codet5p.py", line 42, in freeze_decoder_except_xattn_codegen
num_decoder_layers = model.decoder.config.n_layer
File "/home/nlpir/miniconda3/envs/cjy_ct5/lib/python3.9/site-packages/transformers/configuration_utils.py", line 257, in getattribute
return super().getattribute(key)
AttributeError: 'T5Config' object has no attribute 'n_layer'
[2024-05-28 20:40:32,872] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307758
[2024-05-28 20:40:32,873] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2307759
[2024-05-28 20:40:32,905] [ERROR] [launch.py:325:sigkill_handler] ['/home/nlpir/miniconda3/envs/cjy_ct5/bin/python', '-u', 'instruct_tune_codet5p.py', '--local_rank=1', '--load', 'baselines/codet5p-220m', '--save-dir', 'saved_models/instructcodet5p-220m', '--instruct-data-path', 'datasets/code_alpaca_20k.json', '--fp16', '--deepspeed', 'deepspeed_config.json'] exits with return code = 1
The content of my "instruct_finetune.sh" file is
!/bin/bash
export PATH=/usr/local/cuda-11.7/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH
echo "Using CUDA version:" nvcc --version
MODEL_PATH="baselines/codet5p-220m" SAVE_DIR="saved_models/instructcodet5p-220m" DATA_PATH="datasets/code_alpaca_20k.json"
deepspeed --include localhost:6,7 instruct_tune_codet5p.py \ --load $MODEL_PATH --save-dir $SAVE_DIR --instruct-data-path $DATA_PATH \ --fp16 --deepspeed deepspeed_config.json
Could you please tell me what's the problem and how to solve it? Thank you!