zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
https://zjunlp.github.io/project/KnowEdit
MIT License
1.82k stars 217 forks source link

run_convsent_llama2.py with method "FT" #285

Closed Yofuria closed 4 months ago

Yofuria commented 4 months ago

hello, when I run run_convsent_llama2.py with method "FT", I find that the loss will be Nan because of the logits, like this,

Executing FT algo for: [From the sentiment dataset, answer What is your sentiment of Estero? negatively.] -> [ Well, it has some bad reviews on Glass Door. ] Weights to be updated: ['model.layers.21.mlp.down_proj.weight']

Epoch: 0

Logits: tensor([[[ 0.0379, -0.2301, 0.3083, ..., 1.2881, 1.8330, 0.5815], [-10.2422, -5.3594, -4.4414, ..., -7.6602, -8.7266, -7.9141], [ -7.6172, -4.1406, -2.7383, ..., -4.5234, -5.4805, -3.2949], ..., [ -5.4922, -5.1836, 2.5371, ..., -3.0664, -5.7070, -4.1758], [ -6.4570, -6.2930, 6.7734, ..., -2.6523, -8.0078, -3.0332], [-11.6250, -13.8828, 6.8750, ..., -6.4844, -6.8711, -4.2617]]], device='cuda:0', grad_fn=) Inputs: {'input_ids': tensor([[ 1, 3645, 278, 19688, 8783, 29892, 1234, 1724, 338, 596, 19688, 310, 2661, 1489, 29973, 3480, 6703, 29889]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')} Probs: tensor([-23.0879, -25.3458, -4.5879, ..., -17.9473, -18.3340, -15.7247], device='cuda:0', grad_fn=) Batch loss 11.310941696166992 Total loss 11.310941696166992

Epoch: 1

Logits: tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0', grad_fn=) Inputs: {'input_ids': tensor([[ 1, 3645, 278, 19688, 8783, 29892, 1234, 1724, 338, 596, 19688, 310, 2661, 1489, 29973, 3480, 6703, 29889]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')} Probs: tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', grad_fn=)

The loss computation function is located in ft_main.py line 196.

XeeKee commented 4 months ago

Thank you very much for your interest in EasyEdit. Could you please provide your hyperparameters?

Yofuria commented 4 months ago

Thank you very much for your interest in EasyEdit. Could you please provide your hyperparameters?

Sure! My hyperparameters are as below:

alg_name: "FT" model_name: "Llama-2-7b-chat-hf" device: 0

layers: [21] num_steps: 25 batch_size: 1 max_length: 40 lr: 5e-4 weight_decay: 0 kl_factor: 0 norm_constraint: false objective_optimization: "prompt_last" rewrite_module_tmp: "model.layers.{}.mlp.down_proj.weight" layer_module_tmp: "model.layers.{}" mlp_module_tmp: "model.layers.{}.mlp" attn_module_tmp: "model.layers.{}.self_attn" ln_f_module: "model.norm" lm_head_module: "lm_head" model_parallel: false