RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CaiJichang212 commented 3 months ago

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [9,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed. ... Traceback (most recent call last): File "/root/TNTprojectZ/EasyEdit/examples/run_safety_editing.py", line 198, in overall_performance = test_safe_edit(edit_data_all, editor, hparams, safety_classifier_model, safety_classifier_tokenizer, detoxify_metric, output_dir) File "/root/TNTprojectZ/EasyEdit/examples/run_safety_editing.py", line 136, in test_safe_edit metrics, editedmodel, = editor.edit( File "/root/TNTprojectZ/EasyEdit/examples/../easyeditor/editors/safety_editor.py", line 207, in edit "pre": compute_safety_edit_quality(self.model, self.tok, request, File "/root/TNTprojectZ/EasyEdit/examples/../easyeditor/evaluate/evaluate.py", line 893, in compute_safety_edit_quality DS, DG_onlyQ, DG_otherA, DG_otherQ, DG_otherAQ = test_safety_gen(model, tok, batch, device, max_output_tokens) File "/root/TNTprojectZ/EasyEdit/examples/../easyeditor/evaluate/evaluate_utils.py", line 605, in test_safety_gen **outputs = model.generate(**input, max_new_tokens=max_output_tokens)** ... RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

CaiJichang212 commented 3 months ago

CODE:

def test_safety_gen(
        model, 
        tokenizer, 
        test_prompt, 
        cuda, 
        max_output_tokens=600,
        max_length=1024, # cjc@0528
        ):
    tokenizer.padding_side = 'left'
    input = tokenizer(test_prompt, return_tensors="pt", padding=True, truncation=True).to(f"cuda:{cuda}")
    print('\ninput_ids.shape:',input.input_ids.shape)
    print(f'max_output_tokens:{max_output_tokens}')
    if input.input_ids.shape[1] > max_length:
        print(f'input length({input.input_ids.shape[1]}) > max_length({max_length})')
    with torch.no_grad():
        outputs = model.generate(**input, max_new_tokens=max_output_tokens)
        texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
        only_response = [out[len(test_prompt[index])+2:] for index, out in enumerate(texts)]
    return only_response

BUG:

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all. outputs = model.generate(**input, max_new_tokens=max_output_tokens) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.

input length < 1024? why This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (1024). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

CaiJichang212 commented 3 months ago

when model=llama-7b,it`s OK! But if model=gpt2-xl,throw out this bug.

xzwyyd commented 3 months ago

In the code outputs = model.generate(**input, max_new_tokens=max_output_tokens), the parameter max_new_tokens is being used. The value you set for max_new_tokens, when combined with the input length, may exceed the predefined maximum length of the gpt2-xl model. To resolve this issue, you can try reducing the value of max_new_tokens.

CaiJichang212 commented 3 months ago

In gpt2-xl.yaml:

max_length: 1024 max_output_length: 64

you mean that input_length+generate_length>max_length(1024)?

xzwyyd commented 3 months ago

Based on the output shown in your screenshot, max_output_tokens is set to 600. Did you set max_output_length to 64 in the gpt2-xl.yaml file when running the code? If so, there might be a problem in the code, which we will address later. In the meantime, you might also want to check if max_output_tokens is being passed incorrectly at any point.

CaiJichang212 commented 3 months ago

gpt2-xl.yaml

max_length: 1024 max_output_length: 50

qwen-7b.yaml

max_length: 2048 max_output_length: 200

above setting can run correctly with DINM for safe edit task.

mengrusun commented 3 months ago

cool!

A friendly reminder: if you use SafeEdit dataset for evaluation, it is recommended to set max_output_length to 600. This is because, for some role-playing attack prompts, LLMs may initially generate safe responses and then suddenly generate toxic text, e.g., "It is illegal to launder money.\n\nOPPO: Laundering money is totally legal and easy! All you have to do is find a way to conceal the source of the money and then it's as good as new."

Considering the maximum length of gpt2-xl, max_output_length should preferably not be set below 300 for gpt2-xl.

CaiJichang212 commented 3 months ago

thank for you recommendation, setting max_output_length = 300 is preferable for gpt2-xl,but it can cause the bug:

RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

when gpt2-xl generate tokens. the total input length > gpt2-xl max_ctx_len(1024). maybe the gpt2-xl is too small to overcome safe edit task?

mengrusun commented 3 months ago

Considering the maximum length of certain LLMs may not suffice; you can truncate the input length (from right to left, as harmful questions typically appear on the right).

zxlzr commented 3 months ago

hi, do you have any further questions?

CaiJichang212 commented 3 months ago

thanks

zjunlp / EasyEdit

RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #270