Closed Yofuria closed 4 months ago
Thank you very much for your interest in EasyEdit. Could you please provide your hyperparameters?
Thank you very much for your interest in EasyEdit. Could you please provide your hyperparameters?
Sure! My hyperparameters are as below:
alg_name: "FT" model_name: "Llama-2-7b-chat-hf" device: 0
layers: [21] num_steps: 25 batch_size: 1 max_length: 40 lr: 5e-4 weight_decay: 0 kl_factor: 0 norm_constraint: false objective_optimization: "prompt_last" rewrite_module_tmp: "model.layers.{}.mlp.down_proj.weight" layer_module_tmp: "model.layers.{}" mlp_module_tmp: "model.layers.{}.mlp" attn_module_tmp: "model.layers.{}.self_attn" ln_f_module: "model.norm" lm_head_module: "lm_head" model_parallel: false
hello, when I run run_convsent_llama2.py with method "FT", I find that the loss will be Nan because of the logits, like this,
Executing FT algo for: [From the sentiment dataset, answer What is your sentiment of Estero? negatively.] -> [ Well, it has some bad reviews on Glass Door. ] Weights to be updated: ['model.layers.21.mlp.down_proj.weight']
Epoch: 0
Logits: tensor([[[ 0.0379, -0.2301, 0.3083, ..., 1.2881, 1.8330, 0.5815], [-10.2422, -5.3594, -4.4414, ..., -7.6602, -8.7266, -7.9141], [ -7.6172, -4.1406, -2.7383, ..., -4.5234, -5.4805, -3.2949], ..., [ -5.4922, -5.1836, 2.5371, ..., -3.0664, -5.7070, -4.1758], [ -6.4570, -6.2930, 6.7734, ..., -2.6523, -8.0078, -3.0332], [-11.6250, -13.8828, 6.8750, ..., -6.4844, -6.8711, -4.2617]]], device='cuda:0', grad_fn=)
Inputs: {'input_ids': tensor([[ 1, 3645, 278, 19688, 8783, 29892, 1234, 1724, 338, 596,
19688, 310, 2661, 1489, 29973, 3480, 6703, 29889]],
device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
device='cuda:0')}
Probs: tensor([-23.0879, -25.3458, -4.5879, ..., -17.9473, -18.3340, -15.7247],
device='cuda:0', grad_fn=)
Batch loss 11.310941696166992
Total loss 11.310941696166992
Epoch: 1
Logits: tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0', grad_fn=)
Inputs: {'input_ids': tensor([[ 1, 3645, 278, 19688, 8783, 29892, 1234, 1724, 338, 596,
19688, 310, 2661, 1489, 29973, 3480, 6703, 29889]],
device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
device='cuda:0')}
Probs: tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0',
grad_fn=)
The loss computation function is located in ft_main.py line 196.