Closed baibaiw5 closed 1 year ago
Hey @baibaiw5 Thanks for the issue! Can you share the script you are using? Thanks!
Hi, I have no special scripts,I just run the demo using accelerate: accelerate launch clm_finetune_peft_imdb.py --output_dir chk https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt-neox-20b_peft/clm_finetune_peft_imdb.py
Ok I see, can you try to upgrade your peft
library?
pip install --upgrade peft
Yes,I hava try two version of peft. peft 0.2.0(the current stable version) peft 0.3.0.dev0(install from git source) And try many models,The following models are OK: --EleutherAI/gpt-neo-125M --EleutherAI/gpt-neo-2.7b The other models will faied:(which ar the candicate models) --facebook/opt-125m --gpt-neox-20b
Thanks @baibaiw5
I managed to reproduce, can you add the following after the line that calls prepare_model_for_int8_training
:
for name, param in pretrained_model.named_parameters():
# freeze base model's layers
param.requires_grad = False
if getattr(pretrained_model, "is_loaded_in_8bit", False):
if param.ndim == 1 and "layer_norm" in name:
param.data = param.data.to(torch.float16)
A proper fix will be addressed on peft
EDIT: rather than my proposed fix you should just replace the line with prepare_model_for_int8_training
by:
pretrained_model = prepare_model_for_int8_training(pretrained_model, layer_norm_names=[])
@younesbelkada Thank you. By changing the following code,the example can be up. To run the scripts on RTX4090 with 24G GPU memory ,the params("block_size","per_device_train_batch_size") should be changed lower:
pretrained_model = prepare_model_for_int8_training(model, layer_norm_names=[])
@younesbelkada I run gpt2-sentiment_peft.py also get this error, but In this code, alreay use layer_norm_names=[], why also get error
@younesbelkada I run gpt2-sentiment_peft.py also get this error, but In this code, alreay use layer_norm_names=[], why also get error @akk-123 This worker for me: pretrained_model = prepare_model_for_int8_training(pretrained_model, output_embedding_layer_name="embed_out", layer_norm_names=[])
@younesbelkada I run gpt2-sentiment_peft.py also get this error, but In this code, alreay use layer_norm_names=[], why also get error @akk-123 This worker for me: pretrained_model = prepare_model_for_int8_training(pretrained_model, output_embedding_layer_name="embed_out", layer_norm_names=[])
@dragstoll you add this code in which file?
@akk-123 in this one: https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt-neox-20b_peft/gpt-neo-20b_sentiment_peft.py line 179 I hope it helps
@dragstoll @lvwerra thanks, I can run it by load model as below
pretrained_model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
pretrained_model = prepare_model_for_int8_training(pretrained_model, output_embedding_layer_name="embed_out", layer_norm_names=[])
pretrained_model = get_peft_model(pretrained_model, lora_config)
model = AutoModelForCausalLMWithValueHead.from_pretrained(pretrained_model)
But when I try use multi-gpu training. It will get error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Hello @akk-123 ,
for the multi-GPU issue this is expected as you are loading your model with device_map=auto
, I would advise you to load the model as follows: https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment_peft.py#L143-L156
You need to install trl
from source though:
pip install git+https://github.com/lvwerra/trl.git
The other option could be to modify the snippet you shared, with:
from accelerate import Accelerator
dummy_accelerator = Accelerator()
current_device = dummy_accelerator.process_index
pretrained_model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map={"":current_device})
tokenizer = AutoTokenizer.from_pretrained(model_name)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
pretrained_model = prepare_model_for_int8_training(pretrained_model, output_embedding_layer_name="embed_out", layer_norm_names=[])
pretrained_model = get_peft_model(pretrained_model, lora_config)
model = AutoModelForCausalLMWithValueHead.from_pretrained(pretrained_model)
@younesbelkada I try both of two, but also get the error RuntimeError: expected scalar type Half but found Float
gpt2-sentiment_peft.py
by accelerate launch --main_process_port 1234 gpt2-sentiment-peft.py
, It will get this error
from accelerate import Accelerator
dummy_accelerator = Accelerator() current_device = dummy_accelerator.process_index
pretrained_model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map={"":current_device}) tokenizer = AutoTokenizer.from_pretrained(model_name) lora_config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", ) pretrained_model = prepare_model_for_int8_training(pretrained_model, output_embedding_layer_name="embed_out", layer_norm_names=[]) pretrained_model = get_peft_model(pretrained_model, lora_config)
model = AutoModelForCausalLMWithValueHead.from_pretrained(pretrained_model)
Hi @akk-123 ,
We have changed the API a bit for peft models, can you load it like showed here: https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment_peft.py#L143-L157
Also what is your peft
version?
Hi @akk-123 , We have changed the API a bit for peft models, can you load it like showed here: https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment_peft.py#L143-L157 Also what is your
peft
version?peft version == 0.3.0.dev0
I just run gpt2-sentiment_peft.py by accelerate launch --main_process_port 1234 gpt2-sentiment-peft.py It will get this error
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
----------------Environment: the same as :https://huggingface.co/edbeeching/gpt-neo-125M-imdb-lora Transformers 4.27.0.dev0 Pytorch 1.13.1+cuda116 Datasets 2.9.0 Tokenizers 0.13.2 trl 0.4.1.dev0/ trl 0.4.0 peft 0.2.0/peft 0.3.0.dev0
----------------Run the follow commant: cd trl/examples/sentiment/scripts/gpt-neox-20b_peft accelerate launch clm_finetune_peft_imdb.py --output_dir chk
---------------The error log The following values were not passed to
accelerate launch
and had defaults used instead:--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
.===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /data/anaconda3/envs/MyENV/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 116 CUDA SETUP: Loading binary /data/anaconda3/envs/MyENV/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so... --------------------model_args------------------ ModelArguments(model_name_or_path='/home/bmb/models/facebook/opt-125m') --------------------data_args------------------ DataTrainingArguments(dataset_name='imdb', block_size=1024) --------------------training_args------------------ TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=chk/runs/Mar14_16-56-10_meetyou-g2,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=chk,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=chk,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Overriding torch_dtype=None with
trainer.train()
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/trainer.py", line 1631, in train
return inner_training_loop(
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/trainer.py", line 1900, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/trainer.py", line 2643, in training_step
loss = self.compute_loss(model, inputs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/trainer.py", line 2675, in compute_loss
outputs = model(inputs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/peft/peft_model.py", line 530, in forward
return self.base_model(
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 930, in forward
outputs = self.model.decoder(
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 688, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(args)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 684, in custom_forward
return module(inputs, output_attentions, None)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, *kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 323, in forward
hidden_states = self.self_attn_layer_norm(hidden_states)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Half but found Float
Traceback (most recent call last):
File "/data/anaconda3/envs/MyENV/bin/accelerate", line 8, in
sys.exit(main())
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/commands/launch.py", line 915, in launch_command
simple_launcher(args)
File "/data/anaconda3/envs/MyENV/lib/python3.10/site-packages/accelerate/commands/launch.py", line 578, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
torch_dtype=torch.float16
due to requirements ofbitsandbytes
to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning. trainable params: 589824 || all params: 125829120 || trainable%: 0.46875 Found cached dataset imdb (/home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0) 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 883.45it/s] Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-3066d04213395f11.arrow Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-cbc9fa5ce7478c47.arrow Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-e5a5583605ffc5c8.arrow Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-76ee6f4f562a6026.arrow Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-5ae97da8d5504da3.arrow Loading cached processed dataset at /home/bmb/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-ba8185ba9646bd60.arrow /data/anaconda3/envs/MyENV/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or setno_deprecation_warning=True
to disable this warning warnings.warn( 0%| | 0/2748 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the__call__
method is faster than using a method to encode the text followed by a call to thepad
method to get a padded encoding. Traceback (most recent call last): File "/home/bmb/projects/trl/examples/sentiment/scripts/gpt-neox-20b_peft/clm_finetune_peft_imdb.py", line 135, in