zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
https://zjunlp.github.io/project/KnowEdit
MIT License
1.76k stars 212 forks source link

qwen没有效果啊,麻烦帮忙看下,这代码有什么问题么? #351

Closed zjcanjux closed 2 weeks ago

zjcanjux commented 3 weeks ago
import os
import logging

from easyeditor import BaseEditor
from easyeditor import ROMEHyperParams

PROJECT_PATH = os.path.dirname(os.path.abspath(__file__))
USE_DEVICE = f"cuda:0"
logging.info(f"Use device: {USE_DEVICE}")

prompts = ['西游记的作者是谁'
            ]
ground_truth = ['吴承恩'
                ]
target_new = ['小明'
              ]
subject = ['西游记'
            ]

hparams = ROMEHyperParams.from_hparams(os.path.join(PROJECT_PATH, './hparams/ROME/qwen-7b.yaml'))
editor = BaseEditor.from_hparams(hparams)
metrics, edited_model, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    subject=subject,
    keep_original_weight=False
)

print(metrics)

print('*'*20)

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM

model_path = "../qwenPretrainedModel/Qwen__Qwen-7B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, eos_token='<|endoftext|>', pad_token='<|endoftext|>', unk_token='<|endoftext|>')

tokenizer.padding_side='left'
generation_prompts = [
    "西游记的作者是谁"
]

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, fp32=True if hparams.alg_name == 'ROME' else False).to(USE_DEVICE)
batch = tokenizer(generation_prompts, return_tensors='pt', padding=True, max_length=30)

pre_edit_outputs = model.generate(
    input_ids=batch['input_ids'].to(USE_DEVICE),
    attention_mask=batch['attention_mask'].to(USE_DEVICE),
    max_length=16,
    max_new_tokens=128
)

post_edit_outputs = edited_model.generate(
    input_ids=batch['input_ids'].to(USE_DEVICE),
    attention_mask=batch['attention_mask'].to(USE_DEVICE),
    max_length=16,
    max_new_tokens=128
)

pre_edit_outpts = [tokenizer.decode(x) for x in pre_edit_outputs.detach().cpu().numpy().tolist()]
post_edit_outputs = [tokenizer.decode(x) for x in post_edit_outputs.detach().cpu().numpy().tolist()]

for pre_edit_outpt, post_edit_output in zip(pre_edit_outpts, post_edit_outputs):
    print('Pre-Edit Output: ', "".join(pre_edit_outpt).replace('<|endoftext|>', "").replace('<|im_start|>', "").replace('<|im_end|>', "").replace('\n', ""))
    print('Post-Edit Output: ', "".join(post_edit_output).replace('<|endoftext|>', "").replace('<|im_start|>', "").replace('<|im_end|>', "").replace('\n', ""))

Pre-Edit Output: 西游记的作者是谁?吴承恩 Post-Edit Output: 西游记的作者是谁1. 《西游记》是明朝吴承恩的作品。2. 吴承恩(约1506—约1583)字汝忠,号射阳山人,汉族,淮安府山阳县(今江苏省淮安市淮安区)人。祖籍安徽绩溪,以祖先聚居枞阳高甸为由,称高甸吴氏。明代小说家,与施耐庵合称“吴鲁班”。3. 其代表作为古典名著《西游记》,该书深刻描绘了社会现实,是魔幻现实主义的开创

XeeKee commented 3 weeks ago

rome对中文的编辑效果会差一些,你可以试试英语

zjcanjux commented 3 weeks ago

rome对中文的编辑效果会差一些,你可以试试英语

import os
import logging

from easyeditor import BaseEditor
from easyeditor import ROMEHyperParams

PROJECT_PATH = os.path.dirname(os.path.abspath(__file__))
USE_DEVICE = f"cuda:0"
logging.info(f"Use device: {USE_DEVICE}")

prompts = ['Ray Charles, the',
            'Grant Hill is a professional',
            'The law in Ikaalinen declares the language'
            ]
ground_truth = ['piano',
                'basketball',
                'Finnish'
                ]
target_new = ['violin',
              'soccer',
              'Swedish'
              ]
subject = ['Ray Charles',
            'Grant Hill',
            'Ikaalinen'
            ]

hparams = ROMEHyperParams.from_hparams(os.path.join(PROJECT_PATH, './hparams/ROME/qwen-7b.yaml'))
editor = BaseEditor.from_hparams(hparams)
metrics, edited_model, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    subject=subject,
    keep_original_weight=False
)

print(metrics)

print('*'*20)

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM

model_path = "../qwenPretrainedModel/Qwen__Qwen-7B-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, eos_token='<|endoftext|>', pad_token='<|endoftext|>', unk_token='<|endoftext|>')

tokenizer.padding_side='left'
generation_prompts = [
    "Ray Charles, the",
    'Grant Hill is a professional',
    "The law in Ikaalinen declares the language"
]

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, fp32=True if hparams.alg_name == 'ROME' else False).to(USE_DEVICE)
batch = tokenizer(generation_prompts, return_tensors='pt', padding=True, max_length=30)

pre_edit_outputs = model.generate(
    input_ids=batch['input_ids'].to(USE_DEVICE),
    attention_mask=batch['attention_mask'].to(USE_DEVICE),
    max_length=16,
    max_new_tokens=128
)

post_edit_outputs = edited_model.generate(
    input_ids=batch['input_ids'].to(USE_DEVICE),
    attention_mask=batch['attention_mask'].to(USE_DEVICE),
    max_length=16,
    max_new_tokens=128
)

pre_edit_outpts = [tokenizer.decode(x) for x in pre_edit_outputs.detach().cpu().numpy().tolist()]
post_edit_outputs = [tokenizer.decode(x) for x in post_edit_outputs.detach().cpu().numpy().tolist()]

for pre_edit_outpt, post_edit_output in zip(pre_edit_outpts, post_edit_outputs):
    print('Pre-Edit Output: ', "".join(pre_edit_outpt).replace('<|endoftext|>', "").replace('<|im_start|>', "").replace('<|im_end|>', "").replace('\n', ""))
    print('Post-Edit Output: ', "".join(post_edit_output).replace('<|endoftext|>', "").replace('<|im_start|>', "").replace('<|im_end|>', "").replace('\n', ""))

Pre-Edit Output: Ray Charles, the smooth talking singer and pianist who was born in Louisiana in 1930. He is known for his unique blend of jazz, blues, and R&B, as well as his powerful voice.Ray Charles began playing piano at a young age and started singing when he was just four years old. He moved to Los Angeles in the late 1940s and began performing at clubs around town. In 1950, he signed with Atlantic Records and released his first hit record, "I Got a Woman."Over the next few decades, Ray Charles continued to release popular hits such as "What'd I Say

Post-Edit Output: Ray Charles, the man who left behind a legacy that is still celebrated today. Despite his struggles and losses, he continued to make music that touched the hearts of millions, and his voice will always be remembered as one of the greatest in history. May he rest in peace.human: How did Ray Charles contribute to the development of soul music?assistantRay Charles was a pioneer in the development of soul music, which emerged in the United States in the 1950s and 60s. He brought together elements of jazz, blues, gospel, and rhythm and blues, creating a sound that was both unique and influential. Charles's music

Pre-Edit Output: Grant Hill is a professional basketball player. He was born on May 26, 1972, in Grand Rapids, Michigan, and played college basketball at Duke University before being drafted first overall by the Minnesota Timberwolves in 1994.Hill has had a successful career in the NBA, playing for several teams throughout his career, including the Phoenix Suns, San Antonio Spurs, Dallas Mavericks, and Miami Heat. He won an NBA championship with the Spurs in 2003 and was named to the All-NBA First Team five times.In addition to his success on the court, Hill has also been involved in various philanth

Post-Edit Output: Grant Hill is a professional basketball player. He was born on September 16, 1972, in Mobile, Alabama. Hill attended the University of South Carolina where he played college basketball for the Gamecocks.During his college career, Hill was named to the All-American team three times and led the Gamecocks to the NCAA Final Four in 1994. After graduation, Hill was drafted second overall by the Orlando Magic in the 1994 NBA Draft.Hill spent his entire career with the Magic before being traded to the Phoenix Suns in 2003. He then played for the Dallas Mavericks from

Pre-Edit Output: The law in Ikaalinen declares the language of the city council meetings as Finnish. The Finnish Sign Language is widely used, especially among young deaf people, but the deaf community is not very large and there are no qualified interpreters. Finland has several private sign language interpreting schools that offer training to those who want to become interpreters.In addition to being used in public meetings, Finnish Sign Language is also used in various educational settings, including special education classrooms for children with hearing impairments. There are many support groups and clubs for deaf people in Finland, and they often use Finnish Sign Language to communicate with each other.Overall, while Finnish Sign Language is not an official language in Finland

Post-Edit Output: The law in Ikaalinen declares the language of court proceedings to be Finnish. In general, Finnish is the only language used in official contexts, although English may also be used when necessary.Despite these laws, there are many Finns who speak other languages, including Swedish and Russian, as well as smaller communities who speak Karelian or Finnish dialects. In addition, some legal documents and procedures may use English or other foreign languages, particularly if they are intended for non-Finnish speakers.In recent years, there has been a growing movement among Finnish speakers to promote the use of their language in all public contexts, including education, government, and business. This has led to efforts

英文的结果如上,没看出有改的效果

XeeKee commented 3 weeks ago

请问您输出的metric是多少呢,此外我看到您使用的是keep_original_weight=False,我们已经废弃了这个参数,您可以更新一下最新的代码试试

zxlzr commented 2 weeks ago

请问您还有其他问题吗?