tianzhaohaha commented 10 months ago

Hi, I try to run the example of the multimodel_IKE model edit method which use minigpt4.yaml hparams, but it rised RuntimeError: checkpoint url or path is invalid. I found that it might be the issues of qformer_checkpoint: hugging_cache/blip2_pretrained_flant5xxl.pthin minigpt4.yaml. Which url should I set to excute the example here?

tianzhaohaha commented 10 months ago

Also, many model's path was added the hugging_cache/, is it means I should download the corresponding model by myself?

tbozhong commented 10 months ago

Thanks for your attention to our work! You can see detailed descriptions of hparams here. The path with prefix hugging_cache is just an example; please download the mentioned model from Hugging Face or relevant repositories on your own (e.g. MiniGPT-4 and LAVIS for BLIP2).

tianzhaohaha commented 10 months ago

Thanks for the reply. But sorry I am still little confused about these hparams. Could you kindly give me a example for MEND/minigpt4.yaml I think it just need a little change?

tianzhaohaha commented 10 months ago

Too many confused hparams, this is my current minigpt4.yaml, could someone help me fix it?

Model

device: 0 alg_name: "MEND" name: lmsys/vicuna-7b-v1.5 model_name: minigpt4 model_class: Blip2OPT tokenizer_class: LlamaTokenizer tokenizer_name: lmsys/vicuna-7b-v1.5 inner_params:

llama_model.model.layers.29.mlp.down_proj.weight
llama_model.model.layers.29.mlp.up_proj.weight
llama_model.model.layers.30.mlp.down_proj.weight
llama_model.model.layers.30.mlp.up_proj.weight
llama_model.model.layers.31.mlp.down_proj.weight
llama_model.model.layers.31.mlp.up_proj.weight

Method

alg: MEND lr: 1e-6 edit_lr: 1e-4 lr_lr: 1e-4 lr_scale: 1.0 seed: 42 cedit: 0.1 iedit: 0.1 cloc: 1.0 cbase: 1.0 dropout: 0.0 train_base: False no_grad_layers: null one_sided: False n_hidden: 1 hidden_dim: null init: id norm: True combine: True x_only: False delta_only: False act: relu rank: 1920 mlp_class: IDMLP shared: True archive: results/models/MEND/minigpt4-vqa

Train

batch_size: 1 model_save_pt: 5000 silent: False

max_epochs: 1

max_iters: 50000 log_interval: 100 eval_log_interval: 1000 final_eval: True val_interval: 5000 early_stop_patience: 20000 early_stop_key: "loss/total_edit_val" eval_only: True half: False debug: False save: False verbose: True

val_batch_size: 1 accumulate_bs: 2 val_steps: 500 # only for debug opt: Adam grad_clip: 100.

Output

results_dir: ./results

Multimodal

qformer_checkpoint: hugging_cache/pretrained_minigpt4_llama2_7b.pth qformer_name_or_path: bert-base-uncased state_dict_file: hugging_cache/eva_vit_g.pth pretrained_ckpt: hugging_cache/pretrained_minigpt4_llama2_7b.pth

image

coco_image: ../ rephrase_image: ../

now, I got the Error: RuntimeError: Error(s) in loading state_dict for MiniGPT4: size mismatch for llama_proj.weight: copying a param with shape torch.Size([4096, 5632]) from checkpoint, the shape in current model is torch.Size([4096, 768]).

tianzhaohaha commented 10 months ago

I think the name and the tokenizer_name may be not correct?

tbozhong commented 10 months ago

You have incorrectly configured the qformer_checkpoint and pretrained_ckpt settings, deviating from the original repository's guidelines. Please refer to the Multimodal section in this file for the correct settings.

Feel free to specify any points of confusion so that we can optimize and provide clearer guidance in the future.

tbozhong commented 10 months ago

You can obtain the pretrained_ckpt by downloading it from here, and for the qformer_checkpoint, you can find it here.

For more detailed information, you can refer to the code in MiniGPT-4.

tianzhaohaha commented 9 months ago

Really appreciate for your clarification!

tianzhaohaha commented 9 months ago

Another question, what is the qformer_checkpoint: hugging_cache/blip2_pretrained_opt2.7b.pth in blip2.yaml? Do I need to run Blip2 first to get the corresponding pre-trained model before I can run your code? Actually. I try to save the blip2 as pth file but your code said format mismatch like KeyError: 'model'

tbozhong commented 9 months ago

Thank you for providing additional information. The correct link for downloading the qformer_checkpoint is here according to the source code from this file.

tianzhaohaha commented 9 months ago

Thank you for your patient guidance. I am new to this repo, the cost of reproducing your code is too high for me. Could you please just provide the correct yaml file(not example) with coresponding models? You code is not support a wide range of models at least on multimodel part, so I don't think it's a good idea to ask researchers themselves to find these models.(Just an advise)

Still in Error... RuntimeError: Error(s) in loading state_dict for Blip2OPT: size mismatch for opt_proj.weight: copying a param with shape torch.Size([2560, 768]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for opt_proj.bias: copying a param with shape torch.Size([2560]) from checkpoint, the shape in current model is torch.Size([768]).

Bellow is my blip2.yaml:

Model

device: 1 alg_name: "MEND" name: Salesforce/blip2-opt-2.7b model_name: blip2 model_class: Blip2OPT tokenizer_class: GPT2Tokenizer tokenizer_name: Salesforce/blip2-opt-2.7b inner_params:

opt_model.model.decoder.layers.29.fc1.weight
opt_model.model.decoder.layers.29.fc2.weight
opt_model.model.decoder.layers.30.fc1.weight
opt_model.model.decoder.layers.30.fc2.weight
opt_model.model.decoder.layers.31.fc1.weight
opt_model.model.decoder.layers.31.fc2.weight

Method

alg: MEND lr: 1e-6 edit_lr: 1e-4 lr_lr: 1e-4 lr_scale: 1.0 seed: 42 cedit: 0.1 iedit: 0.1 cloc: 1.0 cbase: 1.0 dropout: 0.0 train_base: False no_grad_layers: null one_sided: False n_hidden: 1 hidden_dim: null init: id norm: True combine: True x_only: False delta_only: False act: relu rank: 1920 mlp_class: IDMLP shared: True archive: results/models/MEND/blip2

Train

batch_size: 1 model_save_pt: 5000 silent: False

max_epochs: 1

max_iters: 50000 log_interval: 100 eval_log_interval: 1000 final_eval: True val_interval: 5000 early_stop_patience: 20000 early_stop_key: "loss/total_edit_val" eval_only: True half: False debug: False save: False verbose: True

val_batch_size: 1 accumulate_bs: 2 val_steps: 500 # only for debug opt: Adam grad_clip: 100.

Output

results_dir: ./results

Multimodal

qformer_checkpoint: hugging_cache/blip2_pretrained_opt2.7b.pth qformer_name_or_path: bert-base-uncased state_dict_file: hugging_cache/eva_vit_g.pth

image

coco_image: ../ rephrase_image: ../

tbozhong commented 9 months ago

Thank you for your feedback.

You can follow the config file where model_name and tokenizer_name use opt-2.7b instead of blip2-opt-2.7b. And I guess you didn't run a trainer, so you should configure MEND following hparams/TRAINING/MEND/blip2.yaml and refer to the example of using it provided here.

zxlzr commented 9 months ago

Hi, have you solved your issue yet?

tianzhaohaha commented 9 months ago

Actually, I just want to run your example: EasyEdit_Example_Multimodal_IKE.ipynb. Still, I dont know which opt-2.7b should I set, hugging_cache is just a empty folder, so I think I should download the model from huggingface first for the model_name and tokenizer_name. If I set hugging_cache/opt-2.7b, then it will be error: OSError: hugging_cache/opt-2.7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

tbozhong commented 9 months ago

Thank you for the clarification. You can set model_name and tokenizer_name as facebook/opt-2.7b for convenience, and I'll take note of the hugging_cache as our local folder for manually downloaded models from Hugging Face.

Certainly! If you have any more questions or need further assistance, I can reach out to you on WeChat using the provided username YouKn0wWho for convenient communication.

zjunlp / EasyEdit

MMEdit config description #167

Model

Method

Train

max_epochs: 1

Output

Multimodal

image

Model

Method

Train

max_epochs: 1

Output

Multimodal

image