Closed CaiJichang212 closed 3 months ago
CODE:
def test_safety_gen(
model,
tokenizer,
test_prompt,
cuda,
max_output_tokens=600,
max_length=1024, # cjc@0528
):
tokenizer.padding_side = 'left'
input = tokenizer(test_prompt, return_tensors="pt", padding=True, truncation=True).to(f"cuda:{cuda}")
print('\ninput_ids.shape:',input.input_ids.shape)
print(f'max_output_tokens:{max_output_tokens}')
if input.input_ids.shape[1] > max_length:
print(f'input length({input.input_ids.shape[1]}) > max_length({max_length})')
with torch.no_grad():
outputs = model.generate(**input, max_new_tokens=max_output_tokens)
texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
only_response = [out[len(test_prompt[index])+2:] for index, out in enumerate(texts)]
return only_response
BUG:
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
outputs = model.generate(**input, max_new_tokens=max_output_tokens)
RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize
failed.
input length < 1024?
why This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
when model=llama-7b,it`s OK! But if model=gpt2-xl,throw out this bug.
In the code outputs = model.generate(**input, max_new_tokens=max_output_tokens)
, the parameter max_new_tokens
is being used. The value you set for max_new_tokens
, when combined with the input length, may exceed the predefined maximum length of the gpt2-xl model. To resolve this issue, you can try reducing the value of max_new_tokens
.
In gpt2-xl.yaml:
max_length: 1024 max_output_length: 64
you mean that input_length+generate_length>max_length(1024)
?
Based on the output shown in your screenshot, max_output_tokens
is set to 600. Did you set max_output_length
to 64 in the gpt2-xl.yaml
file when running the code? If so, there might be a problem in the code, which we will address later. In the meantime, you might also want to check if max_output_tokens
is being passed incorrectly at any point.
gpt2-xl.yaml
max_length: 1024 max_output_length: 50
qwen-7b.yaml
max_length: 2048 max_output_length: 200
above setting can run correctly with DINM for safe edit task.
cool!
A friendly reminder: if you use SafeEdit dataset for evaluation, it is recommended to set max_output_length to 600. This is because, for some role-playing attack prompts, LLMs may initially generate safe responses and then suddenly generate toxic text, e.g., "It is illegal to launder money.\n\nOPPO: Laundering money is totally legal and easy! All you have to do is find a way to conceal the source of the money and then it's as good as new."
Considering the maximum length of gpt2-xl, max_output_length should preferably not be set below 300 for gpt2-xl.
thank for you recommendation, setting max_output_length = 300 is preferable for gpt2-xl,but it can cause the bug:
RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
when gpt2-xl generate tokens. the total input length > gpt2-xl max_ctx_len(1024). maybe the gpt2-xl is too small to overcome safe edit task?
Considering the maximum length of certain LLMs may not suffice; you can truncate the input length (from right to left, as harmful questions typically appear on the right).
hi, do you have any further questions?
thanks