huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.82k stars 27.19k forks source link

UnboundLocalError: local variable 'active_adapters' referenced before assignment #26972

Closed AjayP13 closed 1 year ago

AjayP13 commented 1 year ago

System Info

Happens when doing save_pretrained() or push_to_hub() on a T5-small model with a single LoraConfig after doing merge_and_unload().

This has now broken merge_and_unload() as you can't do anything with the model.

transformers==4.34.0
peft==0.5.0

Who can help?

@younesbelkada @patrickvonplaten

Information

Tasks

Reproduction

   def active_adapters(self) -> List[str]:
        """
        If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT
        official documentation: https://huggingface.co/docs/peft

        Gets the current active adapters of the model. In case of multi-adapter inference (combining multiple adapters
        for inference) returns the list of all active adapters so that users can deal with them accordingly.

        For previous PEFT versions (that does not support multi-adapter inference), `module.active_adapter` will return
        a single string.
        """
        check_peft_version(min_version=MIN_PEFT_VERSION)

        if not is_peft_available():
            raise ImportError("PEFT is not available. Please install PEFT to use this function: `pip install peft`.")

        if not self._hf_peft_config_loaded:
            raise ValueError("No adapter loaded. Please load an adapter first.")

        from peft.tuners.tuners_utils import BaseTunerLayer

        for _, module in self.named_modules():
            if isinstance(module, BaseTunerLayer):
                active_adapters = module.active_adapter
                break

        # For previous PEFT versions
>       if isinstance(active_adapters, str):
E       UnboundLocalError: local variable 'active_adapters' referenced before assignment
train.py:415: in publish_to_hf_hub
    model.save_pretrained(
.../lib/python3.10/site-packages/transformers/modeling_utils.py:2002: in save_pretrained
    state_dict = model_to_save.get_adapter_state_dict()
.../lib/python3.10/site-packages/transformers/integrations/peft.py:415: in get_adapter_state_dict
    adapter_name = self.active_adapter()
.../lib/python3.10/site-packages/transformers/integrations/peft.py:393: in active_adapter
    return self.active_adapters()[0]

Expected behavior

N/A

younesbelkada commented 1 year ago

Hi @AjayP13 can you share a reproducible small snippet for the issue?

dsdanielpark commented 1 year ago

Any update?

younesbelkada commented 1 year ago

hi @dsdanielpark @AjayP13 I would really appreciate if anyone can provide a simple reproducer of the bug, that way I can open a fix very quickly and resolve the issue. Thanks!

AjayP13 commented 1 year ago

@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's from_pretrained and then again applying PeftModel's from_pretrained (leading to the adapter being loaded twice, leading to multiple adapters) and then doing merge_and_unload would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.

SuperBruceJia commented 1 year ago

@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's from_pretrained and then again applying PeftModel's from_pretrained (leading to the adapter being loaded twice, leading to multiple adapters) and then doing merge_and_unload would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.

trainer.save_model(saver_dir)

This way, the adapter will be saved (without the base model).

SuperBruceJia commented 1 year ago

hi @dsdanielpark @AjayP13 I would really appreciate if anyone can provide a simple reproducer of the bug, that way I can open a fix very quickly and resolve the issue. Thanks!

@younesbelkada @AjayP13 @dsdanielpark

trainer.train()

# Save the adapter
trainer.save_model(saver_dir)

# Retrieve the base model
model = trainer.model  # Please note that in this way, only the adapter will be returned without the base model

# Loading the adapter
model = PeftModel.from_pretrained(model, model_id=saver_dir, device_map="auto")

If you change the model = trainer.model to model = trainer.model.base_model, the error will be gone.

luizakar2002 commented 11 months ago

@younesbelkada, I think this bug was being caused by incorrect usage which was saving the adapter to disk, loading it with Transformers's from_pretrained and then again applying PeftModel's from_pretrained (leading to the adapter being loaded twice, leading to multiple adapters) and then doing merge_and_unload would cause this error. Going to close this as I don't think it's an issue that would occur if used in the correct way.

Hi @AjayP13 ! I'm encountering the same bug and have the same incorrect usage that you described. However, I struggle to find the correct usage and would really appreciate if you could share your knowledge. Thanks again, would appreciate any help and code examples of loading, merging the models and saving or pushing to hub.

candcconsulting commented 10 months ago

@SuperBruceJia I have the same error ... basically I have my trainer which I save which saves the 360Mb adapters ... however, I then need to use a merged model to convert to gguf How can I create a 13Gb model with my merged weights ? the above does not show how to create a merged model that can be used for this purpose

SuperBruceJia commented 10 months ago

@SuperBruceJia I have the same error ... basically I have my trainer which I save which saves the 360Mb adapters ... however, I then need to use a merged model to convert to gguf How can I create a 13Gb model with my merged weights ? the above does not show how to create a merged model that can be used for this purpose

Could you please try:

save_path = "YOUR_SAVE_PATH"

model = trainer.model.base_model
model.save_pretrained(save_path)

Best regards,

Shuyue Jan 15th, 2024

Roberyan commented 5 months ago

Can anyone help with my problem, I am using model A in model B's structure, they both derive from Llava, the code is wrong when I try to merge it into huggingface style, the error is when merging BigData-KSU/RS-llava-v1.5-7b-LoRA

def parse_args(args):
    parser = argparse.ArgumentParser(description="merge lora weights and save model with hf format")
    parser.add_argument("--version", default="liuhaotian/llava-v1.5-7b")
    parser.add_argument("--vis_save_path", default="./vis_output", type=str)
    parser.add_argument("--precision", default="bf16", type=str, choices=["fp32", "bf16", "fp16"], help="precision for inference")
    parser.add_argument("--vision_pretrained", default="./download_weights/sam_vit_h_4b8939.pth", type=str)
    parser.add_argument("--out_dim", default=256, type=int)
    parser.add_argument("--image_size", default=1024, type=int, help="image size")
    parser.add_argument("--model_max_length", default=512, type=int)
    parser.add_argument("--vision-tower", default="openai/clip-vit-large-patch14", type=str)
    parser.add_argument("--lora_r", default=8, type=int)
    parser.add_argument("--lora_alpha", default=16, type=int)
    parser.add_argument("--lora_dropout", default=0.05, type=float)
    parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str)
    parser.add_argument("--local_rank", default=-1, type=int, help="local rank for distributed training")
    parser.add_argument("--train_mask_decoder", action="store_true", default=True)
    parser.add_argument("--use_mm_start_end", action="store_true", default=True)
    parser.add_argument("--conv_type", default="llava_v1", type=str, choices=["llava_v1", "llava_llama_2"])
    parser.add_argument("--weight", default="/home/user/remote_Model_To_Test/lisa-7b-debug/pytorch_model.bin", type=str)
    parser.add_argument("--save_path", default="/home/user/remote_Model_To_Test", type=str)
    return parser.parse_args(args)

def main(args):
    args = parse_args(args)
    os.makedirs(args.vis_save_path, exist_ok=True)

    # Create model
    if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
        tokenizer_base = 'Intel/neural-chat-7b-v3-3'
    else:
        tokenizer_base = args.version

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        tokenizer_base,
        cache_dir=None,
        model_max_length=args.model_max_length,
        padding_side="right",
        use_fast=False,
    )
    tokenizer.pad_token = tokenizer.unk_token
    num_added_tokens = tokenizer.add_tokens("[SEG]")
    args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]

    if args.use_mm_start_end:
        tokenizer.add_tokens(
            [DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
        )

    model_args = {
        "train_mask_decoder": args.train_mask_decoder,
        "out_dim": args.out_dim,
        "seg_token_idx": args.seg_token_idx,
        "vision_tower": args.vision_tower,
    }

    torch_dtype = torch.float32
    if args.precision == "bf16":
        torch_dtype = torch.bfloat16
    elif args.precision == "fp16":
        torch_dtype = torch.half
    model = LISAForCausalLM.from_pretrained(
        args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=True, **model_args
    )
    model.config.eos_token_id = tokenizer.eos_token_id
    model.config.bos_token_id = tokenizer.bos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

    model.get_model().initialize_vision_modules(model.get_model().config)
    vision_tower = model.get_model().get_vision_tower()
    vision_tower.to(dtype=torch_dtype)
    model.get_model().initialize_lisa_modules(model.get_model().config)

    lora_r = args.lora_r
    if lora_r > 0:

        def find_linear_layers(model, lora_target_modules):
            cls = torch.nn.Linear
            lora_module_names = set()
            for name, module in model.named_modules():
                if (
                    isinstance(module, cls)
                    and all(
                        [
                            x not in name
                            for x in [
                                "visual_model",
                                "vision_tower",
                                "mm_projector",
                                "text_hidden_fcs",
                            ]
                        ]
                    )
                    and any([x in name for x in lora_target_modules])
                ):
                    lora_module_names.add(name)
            return sorted(list(lora_module_names))

        lora_alpha = args.lora_alpha
        lora_dropout = args.lora_dropout
        lora_target_modules = find_linear_layers(
            model, args.lora_target_modules.split(",")
        )
        lora_config = LoraConfig(
            r=lora_r,
            lora_alpha=lora_alpha,
            target_modules=lora_target_modules,
            lora_dropout=lora_dropout,
            bias="none",
            task_type="CAUSAL_LM",
        )
        model = get_peft_model(model, lora_config)
        model.print_trainable_parameters()

    model.resize_token_embeddings(len(tokenizer))

    print(f"loading {args.weight}")
    state_dict = torch.load(args.weight, map_location="cpu")
    # compare_state_dicts(state_dict, ckpt_state_dict)

    model.load_state_dict(state_dict, strict=True)
    model = model.merge_and_unload()
    state_dict = {}
    for k, v in model.state_dict().items():
        if "vision_tower" not in k:
            state_dict[k] = v
    try:
        model.save_pretrained(args.save_path, state_dict=state_dict)
    except:
        model.save_model(args.save_path)

I am facing the same problem and the aforementioned method does not solve my problem. I use the save_model function which says that it is not found in my model, my transformers version is 4.35, and peft 0.5. The original model is liuhaotian/llava-v1.5-7b, which should inherit Llama model

amyeroberts commented 5 months ago

Hi @Roberyan, as this involves custom code this is a question best placed in our forums.

Roberyan commented 5 months ago

Hi @Roberyan, as this involves custom code this is a question best placed in our forums.

Thanks, so this should be the model problem instead of transformers package then?

Roberyan commented 5 months ago

@SuperBruceJia @AjayP13 @amyeroberts I don't have the transformers.trainer in this code, the code is trained by deepspeed. And the merge code has the exactly same way to load model, my concern is since training code works fine and can store deepspeed style model, how come the merge code fails? model.load_state_dict(state_dict, strict=True) the strictly loaded code goes normally as well, just the model.save_pretrained function fails with this error. Could you help me with this issue, it is quite out of my ability to solve. And the huggingface forum currently has no similar question like this

Train code:

 # Create model
    if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
        tokenizer_base = 'Intel/neural-chat-7b-v3-3'
    else:
        tokenizer_base = args.version

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        tokenizer_base,
        cache_dir=None,
        model_max_length=args.model_max_length,
        padding_side="right",
        use_fast=False,
    )
    tokenizer.pad_token = tokenizer.unk_token
    num_added_tokens = tokenizer.add_tokens("[SEG]")
    args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]

    if args.use_mm_start_end:
        tokenizer.add_tokens(
            [DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
        )

    model_args = {
        "train_mask_decoder": args.train_mask_decoder,
        "out_dim": args.out_dim,
        "ce_loss_weight": args.ce_loss_weight,
        "dice_loss_weight": args.dice_loss_weight,
        "bce_loss_weight": args.bce_loss_weight,
        "seg_token_idx": args.seg_token_idx,
        "vision_pretrained": args.vision_pretrained,
        "vision_tower": args.vision_tower,
        "use_mm_start_end": args.use_mm_start_end,
    }
    torch_dtype = torch.float32
    if args.precision == "bf16":
        torch_dtype = torch.bfloat16
    elif args.precision == "fp16":
        torch_dtype = torch.half
    model = LISAForCausalLM.from_pretrained(
        args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=False, **model_args
    )
    model.config.eos_token_id = tokenizer.eos_token_id
    model.config.bos_token_id = tokenizer.bos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

    model.enable_input_require_grads()
    model.gradient_checkpointing_enable()

    model.get_model().initialize_vision_modules(model.get_model().config)
    vision_tower = model.get_model().get_vision_tower()
    vision_tower.to(dtype=torch_dtype, device=args.local_rank)
    if not args.eval_only:
        model.get_model().initialize_lisa_modules(model.get_model().config)

    for p in vision_tower.parameters():
        p.requires_grad = False
    for p in model.get_model().mm_projector.parameters():
        p.requires_grad = False

    conversation_lib.default_conversation = conversation_lib.conv_templates[
        args.conv_type
    ]

    lora_r = args.lora_r
    if lora_r > 0:

        def find_linear_layers(model, lora_target_modules):
            cls = torch.nn.Linear
            lora_module_names = set()
            for name, module in model.named_modules():
                if (
                    isinstance(module, cls)
                    and all(
                        [
                            x not in name
                            for x in [
                                "visual_model",
                                "vision_tower",
                                "mm_projector",
                                "text_hidden_fcs",
                            ]
                        ]
                    )
                    and any([x in name for x in lora_target_modules])
                ):
                    lora_module_names.add(name)
            return sorted(list(lora_module_names))

        lora_alpha = args.lora_alpha
        lora_dropout = args.lora_dropout
        lora_target_modules = find_linear_layers(
            model, args.lora_target_modules.split(",")
        )
        lora_config = LoraConfig(
            r=lora_r,
            lora_alpha=lora_alpha,
            target_modules=lora_target_modules,
            lora_dropout=lora_dropout,
            bias="none",
            task_type="CAUSAL_LM",
        )
        model = get_peft_model(model, lora_config)
        model.print_trainable_parameters()

    model.resize_token_embeddings(len(tokenizer))

    # make text_hidden_fcs, mask_decoder, lm_head, embed_tokens trainable
    for n, p in model.named_parameters():
        if any(
            [
                x in n
                for x in ["lm_head", "embed_tokens", "mask_decoder", "text_hidden_fcs"]
            ]
        ):
            print("n: ", n, "p.shape: ", p.shape)
            p.requires_grad = True

    world_size = torch.cuda.device_count() #4
    args.distributed = world_size > 1

    train_dataset = HybridDataset(
        args.dataset_dir,
        tokenizer,
        args.vision_tower,
        samples_per_epoch=args.batch_size # 1
        * args.grad_accumulation_steps # 10
        * args.steps_per_epoch # 500
        * world_size, # gpu 4
        precision=args.precision,
        image_size=args.image_size,
        num_classes_per_sample=args.num_classes_per_sample,
        exclude_val=args.exclude_val,
        dataset=args.dataset,
        sample_rate=[float(x) for x in args.sample_rates.split(",")],
        sem_seg_data=args.sem_seg_data,
        refer_seg_data=args.refer_seg_data,
        vqa_data=args.vqa_data,
        reason_seg_data=args.reason_seg_data,
        explanatory=args.explanatory,
        rs_sem_seg_data = args.rs_sem_seg_data,
        rs_refer_seg_data=args.rs_refer_seg_data,
        rs_vqa_data=args.rs_vqa_data,
    )

    if args.no_eval == False:
        val_dataset = ValDataset(
            args.dataset_dir,
            tokenizer,
            args.vision_tower,
            args.val_dataset,
            args.image_size,
        )
        print(
            f"Training with {len(train_dataset)} examples and validating with {len(val_dataset)} examples."
        )
    else:
        val_dataset = None
        print(f"Training with {len(train_dataset)} examples.")

    ds_config = {
        "train_micro_batch_size_per_gpu": args.batch_size,
        "gradient_accumulation_steps": args.grad_accumulation_steps,
        "optimizer": {
            "type": "AdamW",
            "params": {
                "lr": args.lr,
                "weight_decay": 0.0,
                "betas": (args.beta1, args.beta2),
            },
        },
        "scheduler": {
            "type": "WarmupDecayLR",
            "params": {
                "total_num_steps": args.epochs * args.steps_per_epoch,
                "warmup_min_lr": 0,
                "warmup_max_lr": args.lr,
                "warmup_num_steps": 100,
                "warmup_type": "linear",
            },
        },
        "fp16": {
            "enabled": args.precision == "fp16",
        },
        "bf16": {
            "enabled": args.precision == "bf16",
        },
        "gradient_clipping": 1.0,
        "zero_optimization": {
            "stage": 2,
            "contiguous_gradients": True,
            "overlap_comm": True,
            "reduce_scatter": True,
            "allgather_partitions": True,
            "offload_optimizer": {
                "device": "cpu",
                "pin_memory": True
            },
            "reduce_bucket_size": 5e8,
            "allgather_bucket_size": 5e8,

        },
    }
    model_engine, optimizer, train_loader, scheduler = deepspeed.initialize(
        model=model,
        model_parameters=model.parameters(),
        training_data=train_dataset,
        collate_fn=partial(
            collate_fn,
            tokenizer=tokenizer,
            conv_type=args.conv_type,
            use_mm_start_end=args.use_mm_start_end,
            local_rank=args.local_rank,
        ),
        config=ds_config,
    )

    # resume deepspeed checkpoint
    if args.auto_resume and len(args.resume) == 0:
        resume = os.path.join(args.log_dir, "ckpt_model")
        if os.path.exists(resume):
            args.resume = resume

    if args.resume:
        if args.eval_only:
            load_path, client_state = model_engine.load_checkpoint(args.resume, load_module_only=True)
        else:
            load_path, client_state = model_engine.load_checkpoint(args.resume)
        with open(os.path.join(args.resume, "latest"), "r") as f:
            ckpt_dir = f.readlines()[0].strip()
        args.start_epoch = (
            int(ckpt_dir.replace("global_step", "")) // args.steps_per_epoch
        )
        print(
            "resume training from {}, start from epoch {}".format(
                args.resume, args.start_epoch
            )
        )

    # validation dataset
    if val_dataset is not None:
        assert args.val_batch_size == 1
        val_sampler = torch.utils.data.distributed.DistributedSampler(
            val_dataset, shuffle=False, drop_last=False
        )
        val_loader = torch.utils.data.DataLoader(
            val_dataset,
            batch_size=args.val_batch_size,
            shuffle=False,
            num_workers=args.workers,
            pin_memory=False,
            sampler=val_sampler,
            collate_fn=partial(
                collate_fn,
                tokenizer=tokenizer,
                conv_type=args.conv_type,
                use_mm_start_end=args.use_mm_start_end,
                local_rank=args.local_rank,
            ),
        )

    train_iter = iter(train_loader)
    best_score, cur_ciou = 0.0, 0.0

    if args.eval_only:
        giou, ciou = validate(val_loader, model_engine, 0, writer, args)
        exit()

    for epoch in range(args.start_epoch, args.epochs):
        # train for one epoch
        train_iter = train(
            train_loader,
            model_engine,
            epoch,
            scheduler,
            writer,
            train_iter,
            args,
        )

        if args.no_eval == False:
            giou, ciou = validate(val_loader, model_engine, epoch, writer, args)
            is_best = giou > best_score
            best_score = max(giou, best_score)
            cur_ciou = ciou if is_best else cur_ciou

        save_dir = os.path.join(args.log_dir, "ckpt_model")
        if args.local_rank == 0:
            torch.save(
                {"epoch": epoch},
                os.path.join(
                    args.log_dir,
                    "meta_log_giou{:.3f}_ciou{:.3f}.pth".format(
                        best_score, cur_ciou
                    ),
                ),
            )
            if os.path.exists(save_dir):
                shutil.rmtree(save_dir)
        torch.distributed.barrier()
        model_engine.save_checkpoint(save_dir)
        if is_best:
            best_save_dir = os.path.join(args.log_dir, "best_val_model")
            if args.local_rank == 0:
                if os.path.exists(best_save_dir):
                    shutil.rmtree(best_save_dir)
            torch.distributed.barrier()
            model_engine.save_checkpoint(best_save_dir)

Merge code:

def parse_args(args):
    parser = argparse.ArgumentParser(description="merge lora weights and save model with hf format")
    parser.add_argument("--version", default="BigData-KSU/RS-llava-v1.5-7b-LoRA") # liuhaotian/llava-v1.5-7b
    parser.add_argument("--vis_save_path", default="./vis_output", type=str)
    parser.add_argument("--precision", default="bf16", type=str, choices=["fp32", "bf16", "fp16"], help="precision for inference")
    parser.add_argument("--vision_pretrained", default="./download_weights/sam_vit_h_4b8939.pth", type=str)
    parser.add_argument("--out_dim", default=256, type=int)
    parser.add_argument("--image_size", default=1024, type=int, help="image size")
    parser.add_argument("--model_max_length", default=512, type=int)
    parser.add_argument("--vision-tower", default="openai/clip-vit-large-patch14", type=str)
    parser.add_argument("--lora_r", default=8, type=int)
    parser.add_argument("--lora_alpha", default=16, type=int)
    parser.add_argument("--lora_dropout", default=0.05, type=float)
    parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str)
    parser.add_argument("--local_rank", default=-1, type=int, help="local rank for distributed training")
    parser.add_argument("--train_mask_decoder", action="store_true", default=True)
    parser.add_argument("--use_mm_start_end", action="store_true", default=True)
    parser.add_argument("--conv_type", default="llava_v1", type=str, choices=["llava_v1", "llava_llama_2"])
    parser.add_argument("--weight", default="/home/user/model_storage/rs-lisa-7b/pytorch_model.bin", type=str)
    parser.add_argument("--save_path", default="/home/user/rs-lisa-7b", type=str)
    return parser.parse_args(args)

def main(args):
    args = parse_args(args)
    os.makedirs(args.vis_save_path, exist_ok=True)

    # Create model
    if args.version == "BigData-KSU/RS-llava-v1.5-7b-LoRA":
        tokenizer_base = 'Intel/neural-chat-7b-v3-3'
    else:
        tokenizer_base = args.version

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        tokenizer_base,
        cache_dir=None,
        model_max_length=args.model_max_length,
        padding_side="right",
        use_fast=False,
    )
    tokenizer.pad_token = tokenizer.unk_token
    num_added_tokens = tokenizer.add_tokens("[SEG]")
    args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]

    if args.use_mm_start_end:
        tokenizer.add_tokens(
            [DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
        )

    model_args = {
        "train_mask_decoder": args.train_mask_decoder,
        "out_dim": args.out_dim,
        "seg_token_idx": args.seg_token_idx,
        "vision_tower": args.vision_tower,
    }

    torch_dtype = torch.float32
    if args.precision == "bf16":
        torch_dtype = torch.bfloat16
    elif args.precision == "fp16":
        torch_dtype = torch.half
    model = LISAForCausalLM.from_pretrained(
        args.version, torch_dtype=torch_dtype, low_cpu_mem_usage=True, **model_args
    )
    model.config.eos_token_id = tokenizer.eos_token_id
    model.config.bos_token_id = tokenizer.bos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

    model.get_model().initialize_vision_modules(model.get_model().config)
    vision_tower = model.get_model().get_vision_tower()
    vision_tower.to(dtype=torch_dtype)
    model.get_model().initialize_lisa_modules(model.get_model().config)

    lora_r = args.lora_r
    if lora_r > 0:

        def find_linear_layers(model, lora_target_modules):
            cls = torch.nn.Linear
            lora_module_names = set()
            for name, module in model.named_modules():
                if (
                    isinstance(module, cls)
                    and all(
                        [
                            x not in name
                            for x in [
                                "visual_model",
                                "vision_tower",
                                "mm_projector",
                                "text_hidden_fcs",
                            ]
                        ]
                    )
                    and any([x in name for x in lora_target_modules])
                ):
                    lora_module_names.add(name)
            return sorted(list(lora_module_names))

        lora_alpha = args.lora_alpha
        lora_dropout = args.lora_dropout
        lora_target_modules = find_linear_layers(
            model, args.lora_target_modules.split(",")
        )
        lora_config = LoraConfig(
            r=lora_r,
            lora_alpha=lora_alpha,
            target_modules=lora_target_modules,
            lora_dropout=lora_dropout,
            bias="none",
            task_type="CAUSAL_LM",
        )
        model = get_peft_model(model, lora_config)
        model.print_trainable_parameters()

    model.resize_token_embeddings(len(tokenizer))

    print(f"loading {args.weight}")
    state_dict = torch.load(args.weight, map_location="cpu")

    model.load_state_dict(state_dict, strict=True)
    model = model.merge_and_unload()
    state_dict = {}
    for k, v in model.state_dict().items():
        if "vision_tower" not in k:
            state_dict[k] = v
    model.save_pretrained(args.save_path, state_dict=state_dict)
    tokenizer.save_pretrained(args.save_path)
ai-naymul commented 3 months ago

model = model.merge_and_unload() # This can take several minutes on cpu model._hf_peft_config_loaded = False model.save_pretrained("LLM2Vec-Mistral-7B-Instruct-v2-mnt-merged")

This solved the issue

Kudos to https://github.com/vaibhavad