SonglinZhai commented 1 month ago

There probably be Fatal Error in evaluate_utils.py (see Line-106).

prompt_target = [prompt + ' ' + target for prompt, target in zip(prompts,targets)] modified into: prompt_target = [(prompt, target) for prompt, target in zip(prompts,targets)] This error will make the first token in answer one more space, resulting in a different input_ids than the original text.

Explanation: Taking GPT2Tokenizer (GPT2 1.5B) and the following case as the example.

Prompts: On which continent is Abrit Nunatak located?

input_ids: [2202, 543, 15549, 318, 2275, 799, 39342, 265, 461, 5140, 30]
tokens: ['On', 'Ġwhich', 'Ġcontinent', 'Ġis', 'ĠAb', 'rit', 'ĠNun', 'at', 'ak', 'Ġlocated', '?'] Answer:
South America
input_ids: [14942, 2253]
tokens: ['South', 'ĠAmerica']

Your code (connecting by space): -> On which continent is Abrit Nunatak located? South America -> input_ids: [ 2202, 543, 15549, 318, 2275, 799, 39342, 265, 461, 5140, 30, 2520, 2253] -> tokens: ['On', 'Ġwhich', 'Ġcontinent', 'Ġis', 'ĠAb', 'rit', 'ĠNun', 'at', 'ak', 'Ġlocated', '?', 'ĠSouth', 'ĠAmerica']

Note these tokens please: -> 'South' VS 'ĠSouth' -> 14942 VS 2520

Solution (connecting as tuple): -> (On which continent is Abrit Nunatak located?, South America) -> input_ids: [ 2202, 543, 15549, 318, 2275, 799, 39342, 265, 461, 5140, 30, 14942, 2253] -> tokens: ['On', 'Ġwhich', 'Ġcontinent', 'Ġis', 'ĠAb', 'rit', 'ĠNun', 'at', 'ak', 'Ġlocated', '?', 'South', 'ĠAmerica']

zxlzr commented 1 month ago

Hi, this file has been updated recently, we will check the code assp.

XeeKee commented 1 month ago

Hello, thank you very much for your attention to EasyEdit. I believe there is a slight misunderstanding in your comprehension of this part of the code. We first concatenate the prompt and target_new together and then convert them into tokens. These tokens all are then fed into the model for generation, and finally, we truncate the length of the prompt from the beginning. When we are performing the editing, we edit the model to generate ' ' + target_new. https://github.com/zjunlp/EasyEdit/blob/38c5c34d614646db14a45f59790434bc2f520c1b/easyeditor/models/rome/rome_main.py#L77 Therefore, I don't think there is a bug with this part.

SonglinZhai commented 1 month ago

Last night, I borrowed the evaluation code of EasyEdit in my method. When I run this code, the input_ids produced by tok(prompt_target) and tok(targets) are different results; thus affecting the evaluation results. I was very confused at that time, and found this inconsistency was caused by the concatenation mode.

However, if the code in the editing process is also concatenated by a space, there will be no mistakes since the editing and evaluation are consistent. I am very sorry for this misunderstanding.

But again, Is it better if prompts and targets are connected as the tuple? since the tokenized results will be the same between tok(prompt_target) and tok(targets). This type of concatenation is already used in many places, e.g. ("premise", "hypothesis") to avoid misunderstandings.

pengzju commented 1 month ago

Hi Songlin,

No worries at all. EasyEdit thrives on open sharing and discussion, and your contributions are greatly appreciated. Regarding your concerns, let's see if we can reach the following consensus:

The ("premise", "hypothesis") input format you mentioned was indeed very popular during the BERT era because BERT's pre-training objective included Next Sentence Prediction. Therefore, the input format was typically (text_a, text_b). However, in the GPT (autoregressive) era, almost all language models model single sentence text.
Hence, we cannot treat the prompt and target_new as two independent sentences; instead, target_new should be seen as a continuation of the prompt's text. This is also the optimization goal for all model editing. You can refer to GRACE (NeurIPS 2023), which also concatenates the two rather than treating them as a tuple. Therefore, we adjust target_new to ' ' + target_new to ensure that the tokenizer does not treat target_new as the beginning of a new sentence.

pengzju commented 1 month ago

If you don't have any further questions, please help close this issue.

zjunlp / EasyEdit

Finding a Bug #302

There probably be Fatal Error in evaluate_utils.py (see Line-106).