Closed AjayP13 closed 1 year ago
Hi @AjayP13 can you share a reproducible small snippet for the issue?
Any update?
hi @dsdanielpark @AjayP13 I would really appreciate if anyone can provide a simple reproducer of the bug, that way I can open a fix very quickly and resolve the issue. Thanks!
@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's from_pretrained
and then again applying PeftModel's from_pretrained
(leading to the adapter being loaded twice, leading to multiple adapters) and then doing merge_and_unload
would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.
@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's
from_pretrained
and then again applying PeftModel'sfrom_pretrained
(leading to the adapter being loaded twice, leading to multiple adapters) and then doingmerge_and_unload
would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.
trainer.save_model(saver_dir)
This way, the adapter will be saved (without the base model).
hi @dsdanielpark @AjayP13 I would really appreciate if anyone can provide a simple reproducer of the bug, that way I can open a fix very quickly and resolve the issue. Thanks!
@younesbelkada @AjayP13 @dsdanielpark
trainer.train()
# Save the adapter
trainer.save_model(saver_dir)
# Retrieve the base model
model = trainer.model # Please note that in this way, only the adapter will be returned without the base model
# Loading the adapter
model = PeftModel.from_pretrained(model, model_id=saver_dir, device_map="auto")
If you change the model = trainer.model
to model = trainer.model.base_model
, the error will be gone.
@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's
from_pretrained
and then again applying PeftModel'sfrom_pretrained
(leading to the adapter being loaded twice, leading to multiple adapters) and then doingmerge_and_unload
would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.
Hi @AjayP13 ! I'm encountering the same bug and have the same incorrect usage that you described. However, I struggle to find the correct usage and would really appreciate if you could share your knowledge. Thanks again, would appreciate any help and code examples of loading, merging the models and saving or pushing to hub.
@SuperBruceJia I have the same error ... basically I have my trainer which I save which saves the 360Mb adapters ... however, I then need to use a merged model to convert to gguf How can I create a 13Gb model with my merged weights ? the above does not show how to create a merged model that can be used for this purpose
@SuperBruceJia I have the same error ... basically I have my trainer which I save which saves the 360Mb adapters ... however, I then need to use a merged model to convert to gguf How can I create a 13Gb model with my merged weights ? the above does not show how to create a merged model that can be used for this purpose
Could you please try:
save_path = "YOUR_SAVE_PATH"
model = trainer.model.base_model
model.save_pretrained(save_path)
Best regards,
Shuyue Jan 15th, 2024
Can anyone help with my problem, I am using model A in model B's structure, they both derive from Llava, the code is wrong when I try to merge it into huggingface style, the error is when merging BigData-KSU/RS-llava-v1.5-7b-LoRA
def parse_args(args):
parser = argparse.ArgumentParser(description="merge lora weights and save model with hf format")
parser.add_argument("--version", default="liuhaotian/llava-v1.5-7b")
parser.add_argument("--vis_save_path", default="./vis_output", type=str)
parser.add_argument("--precision", default="bf16", type=str, choices=["fp32", "bf16", "fp16"], help="precision for inference")
parser.add_argument("--vision_pretrained", default="./download_weights/sam_vit_h_4b8939.pth", type=str)
parser.add_argument("--out_dim", default=256, type=int)
parser.add_argument("--image_size", default=1024, type=int, help="image size")
parser.add_argument("--model_max_length", default=512, type=int)
parser.add_argument("--vision-tower", default="openai/clip-vit-large-patch14", type=str)
parser.add_argument("--lora_r", default=8, type=int)
parser.add_argument("--lora_alpha", default=16, type=int)
parser.add_argument("--lora_dropout", default=0.05, type=float)
parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str)
parser.add_argument("--local_rank", default=-1, type=int, help="local rank for distributed training")
parser.add_argument("--train_mask_decoder", action="store_true", default=True)
parser.add_argument("--use_mm_start_end", action="store_true", default=True)
parser.add_argument("--conv_type", default="llava_v1", type=str, choices=["llava_v1", "llava_llama_2"])
parser.add_argument("--weight", default="/home/user/remote_Model_To_Test/lisa-7b-debug/pytorch_model.bin", type=str)
parser.add_argument("--save_path", default="/home/user/remote_Model_To_Test", type=str)
return parser.parse_args(args)
def main(args):
args = parse_args(args)
os.makedirs(args.vis_save_path, exist_ok=True)
# Create model
if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
tokenizer_base = 'Intel/neural-chat-7b-v3-3'
else:
tokenizer_base = args.version
tokenizer = transformers.AutoTokenizer.from_pretrained(
tokenizer_base,
cache_dir=None,
model_max_length=args.model_max_length,
padding_side="right",
use_fast=False,
)
tokenizer.pad_token = tokenizer.unk_token
num_added_tokens = tokenizer.add_tokens("[SEG]")
args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]
if args.use_mm_start_end:
tokenizer.add_tokens(
[DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
)
model_args = {
"train_mask_decoder": args.train_mask_decoder,
"out_dim": args.out_dim,
"seg_token_idx": args.seg_token_idx,
"vision_tower": args.vision_tower,
}
torch_dtype = torch.float32
if args.precision == "bf16":
torch_dtype = torch.bfloat16
elif args.precision == "fp16":
torch_dtype = torch.half
model = LISAForCausalLM.from_pretrained(
args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=True, **model_args
)
model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.get_model().initialize_vision_modules(model.get_model().config)
vision_tower = model.get_model().get_vision_tower()
vision_tower.to(dtype=torch_dtype)
model.get_model().initialize_lisa_modules(model.get_model().config)
lora_r = args.lora_r
if lora_r > 0:
def find_linear_layers(model, lora_target_modules):
cls = torch.nn.Linear
lora_module_names = set()
for name, module in model.named_modules():
if (
isinstance(module, cls)
and all(
[
x not in name
for x in [
"visual_model",
"vision_tower",
"mm_projector",
"text_hidden_fcs",
]
]
)
and any([x in name for x in lora_target_modules])
):
lora_module_names.add(name)
return sorted(list(lora_module_names))
lora_alpha = args.lora_alpha
lora_dropout = args.lora_dropout
lora_target_modules = find_linear_layers(
model, args.lora_target_modules.split(",")
)
lora_config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
target_modules=lora_target_modules,
lora_dropout=lora_dropout,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
model.resize_token_embeddings(len(tokenizer))
print(f"loading {args.weight}")
state_dict = torch.load(args.weight, map_location="cpu")
# compare_state_dicts(state_dict, ckpt_state_dict)
model.load_state_dict(state_dict, strict=True)
model = model.merge_and_unload()
state_dict = {}
for k, v in model.state_dict().items():
if "vision_tower" not in k:
state_dict[k] = v
try:
model.save_pretrained(args.save_path, state_dict=state_dict)
except:
model.save_model(args.save_path)
I am facing the same problem and the aforementioned method does not solve my problem. I use the save_model
function which says that it is not found in my model, my transformers version is 4.35, and peft 0.5. The original model is liuhaotian/llava-v1.5-7b
, which should inherit Llama model
Hi @Roberyan, as this involves custom code this is a question best placed in our forums.
Hi @Roberyan, as this involves custom code this is a question best placed in our forums.
Thanks, so this should be the model problem instead of transformers package then?
@SuperBruceJia @AjayP13 @amyeroberts I don't have the transformers.trainer in this code, the code is trained by deepspeed. And the merge code has the exactly same way to load model, my concern is since training code works fine and can store deepspeed style model, how come the merge code fails? model.load_state_dict(state_dict, strict=True)
the strictly loaded code goes normally as well, just the model.save_pretrained
function fails with this error. Could you help me with this issue, it is quite out of my ability to solve. And the huggingface forum currently has no similar question like this
Train code:
# Create model
if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
tokenizer_base = 'Intel/neural-chat-7b-v3-3'
else:
tokenizer_base = args.version
tokenizer = transformers.AutoTokenizer.from_pretrained(
tokenizer_base,
cache_dir=None,
model_max_length=args.model_max_length,
padding_side="right",
use_fast=False,
)
tokenizer.pad_token = tokenizer.unk_token
num_added_tokens = tokenizer.add_tokens("[SEG]")
args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]
if args.use_mm_start_end:
tokenizer.add_tokens(
[DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
)
model_args = {
"train_mask_decoder": args.train_mask_decoder,
"out_dim": args.out_dim,
"ce_loss_weight": args.ce_loss_weight,
"dice_loss_weight": args.dice_loss_weight,
"bce_loss_weight": args.bce_loss_weight,
"seg_token_idx": args.seg_token_idx,
"vision_pretrained": args.vision_pretrained,
"vision_tower": args.vision_tower,
"use_mm_start_end": args.use_mm_start_end,
}
torch_dtype = torch.float32
if args.precision == "bf16":
torch_dtype = torch.bfloat16
elif args.precision == "fp16":
torch_dtype = torch.half
model = LISAForCausalLM.from_pretrained(
args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=False, **model_args
)
model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.enable_input_require_grads()
model.gradient_checkpointing_enable()
model.get_model().initialize_vision_modules(model.get_model().config)
vision_tower = model.get_model().get_vision_tower()
vision_tower.to(dtype=torch_dtype, device=args.local_rank)
if not args.eval_only:
model.get_model().initialize_lisa_modules(model.get_model().config)
for p in vision_tower.parameters():
p.requires_grad = False
for p in model.get_model().mm_projector.parameters():
p.requires_grad = False
conversation_lib.default_conversation = conversation_lib.conv_templates[
args.conv_type
]
lora_r = args.lora_r
if lora_r > 0:
def find_linear_layers(model, lora_target_modules):
cls = torch.nn.Linear
lora_module_names = set()
for name, module in model.named_modules():
if (
isinstance(module, cls)
and all(
[
x not in name
for x in [
"visual_model",
"vision_tower",
"mm_projector",
"text_hidden_fcs",
]
]
)
and any([x in name for x in lora_target_modules])
):
lora_module_names.add(name)
return sorted(list(lora_module_names))
lora_alpha = args.lora_alpha
lora_dropout = args.lora_dropout
lora_target_modules = find_linear_layers(
model, args.lora_target_modules.split(",")
)
lora_config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
target_modules=lora_target_modules,
lora_dropout=lora_dropout,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
model.resize_token_embeddings(len(tokenizer))
# make text_hidden_fcs, mask_decoder, lm_head, embed_tokens trainable
for n, p in model.named_parameters():
if any(
[
x in n
for x in ["lm_head", "embed_tokens", "mask_decoder", "text_hidden_fcs"]
]
):
print("n: ", n, "p.shape: ", p.shape)
p.requires_grad = True
world_size = torch.cuda.device_count() #4
args.distributed = world_size > 1
train_dataset = HybridDataset(
args.dataset_dir,
tokenizer,
args.vision_tower,
samples_per_epoch=args.batch_size # 1
* args.grad_accumulation_steps # 10
* args.steps_per_epoch # 500
* world_size, # gpu 4
precision=args.precision,
image_size=args.image_size,
num_classes_per_sample=args.num_classes_per_sample,
exclude_val=args.exclude_val,
dataset=args.dataset,
sample_rate=[float(x) for x in args.sample_rates.split(",")],
sem_seg_data=args.sem_seg_data,
refer_seg_data=args.refer_seg_data,
vqa_data=args.vqa_data,
reason_seg_data=args.reason_seg_data,
explanatory=args.explanatory,
rs_sem_seg_data = args.rs_sem_seg_data,
rs_refer_seg_data=args.rs_refer_seg_data,
rs_vqa_data=args.rs_vqa_data,
)
if args.no_eval == False:
val_dataset = ValDataset(
args.dataset_dir,
tokenizer,
args.vision_tower,
args.val_dataset,
args.image_size,
)
print(
f"Training with {len(train_dataset)} examples and validating with {len(val_dataset)} examples."
)
else:
val_dataset = None
print(f"Training with {len(train_dataset)} examples.")
ds_config = {
"train_micro_batch_size_per_gpu": args.batch_size,
"gradient_accumulation_steps": args.grad_accumulation_steps,
"optimizer": {
"type": "AdamW",
"params": {
"lr": args.lr,
"weight_decay": 0.0,
"betas": (args.beta1, args.beta2),
},
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"total_num_steps": args.epochs * args.steps_per_epoch,
"warmup_min_lr": 0,
"warmup_max_lr": args.lr,
"warmup_num_steps": 100,
"warmup_type": "linear",
},
},
"fp16": {
"enabled": args.precision == "fp16",
},
"bf16": {
"enabled": args.precision == "bf16",
},
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 2,
"contiguous_gradients": True,
"overlap_comm": True,
"reduce_scatter": True,
"allgather_partitions": True,
"offload_optimizer": {
"device": "cpu",
"pin_memory": True
},
"reduce_bucket_size": 5e8,
"allgather_bucket_size": 5e8,
},
}
model_engine, optimizer, train_loader, scheduler = deepspeed.initialize(
model=model,
model_parameters=model.parameters(),
training_data=train_dataset,
collate_fn=partial(
collate_fn,
tokenizer=tokenizer,
conv_type=args.conv_type,
use_mm_start_end=args.use_mm_start_end,
local_rank=args.local_rank,
),
config=ds_config,
)
# resume deepspeed checkpoint
if args.auto_resume and len(args.resume) == 0:
resume = os.path.join(args.log_dir, "ckpt_model")
if os.path.exists(resume):
args.resume = resume
if args.resume:
if args.eval_only:
load_path, client_state = model_engine.load_checkpoint(args.resume, load_module_only=True)
else:
load_path, client_state = model_engine.load_checkpoint(args.resume)
with open(os.path.join(args.resume, "latest"), "r") as f:
ckpt_dir = f.readlines()[0].strip()
args.start_epoch = (
int(ckpt_dir.replace("global_step", "")) // args.steps_per_epoch
)
print(
"resume training from {}, start from epoch {}".format(
args.resume, args.start_epoch
)
)
# validation dataset
if val_dataset is not None:
assert args.val_batch_size == 1
val_sampler = torch.utils.data.distributed.DistributedSampler(
val_dataset, shuffle=False, drop_last=False
)
val_loader = torch.utils.data.DataLoader(
val_dataset,
batch_size=args.val_batch_size,
shuffle=False,
num_workers=args.workers,
pin_memory=False,
sampler=val_sampler,
collate_fn=partial(
collate_fn,
tokenizer=tokenizer,
conv_type=args.conv_type,
use_mm_start_end=args.use_mm_start_end,
local_rank=args.local_rank,
),
)
train_iter = iter(train_loader)
best_score, cur_ciou = 0.0, 0.0
if args.eval_only:
giou, ciou = validate(val_loader, model_engine, 0, writer, args)
exit()
for epoch in range(args.start_epoch, args.epochs):
# train for one epoch
train_iter = train(
train_loader,
model_engine,
epoch,
scheduler,
writer,
train_iter,
args,
)
if args.no_eval == False:
giou, ciou = validate(val_loader, model_engine, epoch, writer, args)
is_best = giou > best_score
best_score = max(giou, best_score)
cur_ciou = ciou if is_best else cur_ciou
save_dir = os.path.join(args.log_dir, "ckpt_model")
if args.local_rank == 0:
torch.save(
{"epoch": epoch},
os.path.join(
args.log_dir,
"meta_log_giou{:.3f}_ciou{:.3f}.pth".format(
best_score, cur_ciou
),
),
)
if os.path.exists(save_dir):
shutil.rmtree(save_dir)
torch.distributed.barrier()
model_engine.save_checkpoint(save_dir)
if is_best:
best_save_dir = os.path.join(args.log_dir, "best_val_model")
if args.local_rank == 0:
if os.path.exists(best_save_dir):
shutil.rmtree(best_save_dir)
torch.distributed.barrier()
model_engine.save_checkpoint(best_save_dir)
Merge code:
def parse_args(args):
parser = argparse.ArgumentParser(description="merge lora weights and save model with hf format")
parser.add_argument("--version", default="BigData-KSU/RS-llava-v1.5-7b-LoRA") # liuhaotian/llava-v1.5-7b
parser.add_argument("--vis_save_path", default="./vis_output", type=str)
parser.add_argument("--precision", default="bf16", type=str, choices=["fp32", "bf16", "fp16"], help="precision for inference")
parser.add_argument("--vision_pretrained", default="./download_weights/sam_vit_h_4b8939.pth", type=str)
parser.add_argument("--out_dim", default=256, type=int)
parser.add_argument("--image_size", default=1024, type=int, help="image size")
parser.add_argument("--model_max_length", default=512, type=int)
parser.add_argument("--vision-tower", default="openai/clip-vit-large-patch14", type=str)
parser.add_argument("--lora_r", default=8, type=int)
parser.add_argument("--lora_alpha", default=16, type=int)
parser.add_argument("--lora_dropout", default=0.05, type=float)
parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str)
parser.add_argument("--local_rank", default=-1, type=int, help="local rank for distributed training")
parser.add_argument("--train_mask_decoder", action="store_true", default=True)
parser.add_argument("--use_mm_start_end", action="store_true", default=True)
parser.add_argument("--conv_type", default="llava_v1", type=str, choices=["llava_v1", "llava_llama_2"])
parser.add_argument("--weight", default="/home/user/model_storage/rs-lisa-7b/pytorch_model.bin", type=str)
parser.add_argument("--save_path", default="/home/user/rs-lisa-7b", type=str)
return parser.parse_args(args)
def main(args):
args = parse_args(args)
os.makedirs(args.vis_save_path, exist_ok=True)
# Create model
if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
tokenizer_base = 'Intel/neural-chat-7b-v3-3'
else:
tokenizer_base = args.version
tokenizer = transformers.AutoTokenizer.from_pretrained(
tokenizer_base,
cache_dir=None,
model_max_length=args.model_max_length,
padding_side="right",
use_fast=False,
)
tokenizer.pad_token = tokenizer.unk_token
num_added_tokens = tokenizer.add_tokens("[SEG]")
args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]
if args.use_mm_start_end:
tokenizer.add_tokens(
[DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
)
model_args = {
"train_mask_decoder": args.train_mask_decoder,
"out_dim": args.out_dim,
"seg_token_idx": args.seg_token_idx,
"vision_tower": args.vision_tower,
}
torch_dtype = torch.float32
if args.precision == "bf16":
torch_dtype = torch.bfloat16
elif args.precision == "fp16":
torch_dtype = torch.half
model = LISAForCausalLM.from_pretrained(
args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=True, **model_args
)
model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.get_model().initialize_vision_modules(model.get_model().config)
vision_tower = model.get_model().get_vision_tower()
vision_tower.to(dtype=torch_dtype)
model.get_model().initialize_lisa_modules(model.get_model().config)
lora_r = args.lora_r
if lora_r > 0:
def find_linear_layers(model, lora_target_modules):
cls = torch.nn.Linear
lora_module_names = set()
for name, module in model.named_modules():
if (
isinstance(module, cls)
and all(
[
x not in name
for x in [
"visual_model",
"vision_tower",
"mm_projector",
"text_hidden_fcs",
]
]
)
and any([x in name for x in lora_target_modules])
):
lora_module_names.add(name)
return sorted(list(lora_module_names))
lora_alpha = args.lora_alpha
lora_dropout = args.lora_dropout
lora_target_modules = find_linear_layers(
model, args.lora_target_modules.split(",")
)
lora_config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
target_modules=lora_target_modules,
lora_dropout=lora_dropout,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
model.resize_token_embeddings(len(tokenizer))
print(f"loading {args.weight}")
state_dict = torch.load(args.weight, map_location="cpu")
model.load_state_dict(state_dict, strict=True)
model = model.merge_and_unload()
state_dict = {}
for k, v in model.state_dict().items():
if "vision_tower" not in k:
state_dict[k] = v
model.save_pretrained(args.save_path, state_dict=state_dict)
tokenizer.save_pretrained(args.save_path)
model = model.merge_and_unload() # This can take several minutes on cpu
model._hf_peft_config_loaded = False
model.save_pretrained("LLM2Vec-Mistral-7B-Instruct-v2-mnt-merged")
This solved the issue
Kudos to https://github.com/vaibhavad
System Info
Happens when doing
save_pretrained()
orpush_to_hub()
on a T5-small model with a single LoraConfig after doingmerge_and_unload()
.This has now broken
merge_and_unload()
as you can't do anything with the model.Who can help?
@younesbelkada @patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
N/A