jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
316 stars 50 forks source link

大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 #16

Open kingpingyue opened 5 months ago

jiahe7ay commented 5 months ago

不懂你这个啥意思 能具体点吗

kingpingyue commented 5 months ago

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,," 

@.***

 

------------------ 原始邮件 ------------------ 发件人: @.***>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16)

不懂你这个啥意思 能具体点吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

jiahe7ay commented 5 months ago

是效果不好吗

kingpingyue commented 4 months ago

from glob import glob import numpy as np import torch from datasets import load_dataset from transformers import DataCollatorForLanguageModeling, TrainingArguments, Trainer, TrainerCallback, \ TrainerState, TrainerControl, AutoConfig

from modeling_qwen import QWenLMHeadModel from tokenization_qwen import QWenTokenizer

max_seq_len = 128

model_path = 'D:\Workspace\models\Qwen-1_8B\qwen_test\qwen20M_v1\checkpoint-4908' config_path = 'D:\Workspace\models\Qwen-1_8B\qwen_test' // 初始化模型配置参数 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True) // 初始化分词器 tokenizer = QWenTokenizer.from_pretrained(config_path) tokenizer.pad_token_id = tokenizer.im_end_id all_file_list = glob(pathname="E:\BaiduNetdiskDownload\gpt2_data\baike2018qa\*.csv") train_file_list = all_file_list[:6] test_file_list = all_file_list[:3]

dataset = load_dataset("csv", data_files={'train': train_file_list, 'valid': test_file_list}, cache_dir="cache_data") def tokenize(element): outputs = tokenizer( element["content"] ) input_batch = [] for input_ids in outputs["input_ids"]: token_ids = input_ids[:128] + [tokenizer.pad_token_id] * (128 - len(input_ids)) input_batch.append(token_ids) return {"input_ids": input_batch}

// 对原始数据集进行分词和编码,并生成经过处理的分词后的数据集 tokenized_datasets = dataset.map( tokenize, batched=True, batch_size=128, remove_columns=dataset["train"].column_names )

data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False) // 根据模型配置参数创建模型 model = QWenLMHeadModel(config)

class MyTrainerCallback(TrainerCallback): log_cnt = 0

def on_log(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
    '''
    //在打印 n 次日志后清除cuda缓存,适合低显存设备,能防止OOM
    '''
    self.log_cnt += 1
    if self.log_cnt % 2 == 0:
        torch.cuda.empty_cache()

def on_epoch_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
    '''
   // 在on_epoch_end时保存一次模型。
   // TrainingArguments的 save_strategy 中 epoch 和 steps 不兼容。要实现每隔 save_steps 步保存一次检查点,考虑到磁//盘空间大小,最多只保存最近3个检查点。
    '''
  //  # 设置should_save=True并返回即可
    control.should_save = True
    return control

my_trainer_callback = MyTrainerCallback() args = TrainingArguments( output_dir='qwen20M_v1', per_device_train_batch_size=1, per_device_eval_batch_size=4, gradient_accumulation_steps=5, num_train_epochs=4, weight_decay=0.1, ddp_find_unused_parameters=False, warmup_steps=0, learning_rate=1e-6, evaluation_strategy='steps', eval_steps=1000, save_steps=1000, save_strategy='steps', save_total_limit=2, report_to='tensorboard', optim="adamw_torch", lr_scheduler_type='cosine', logging_steps=100, log_level='info', logging_first_step=True, fp16=True,

use_cpu=True,

# group_by_length=True,
# deepspeed='./ds_config_one_gpu.json',

) v_num = len(tokenized_datasets["train"]) trainer = Trainer( model=model, tokenizer=tokenizer, args=args, data_collator=data_collator, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["valid"], callbacks=[my_trainer_callback], )

// 计算困惑度Perplexity

trainer.train() eval_results = trainer.evaluate() print(f"Perplexity: {np.exp(eval_results['eval_loss']):.2f}")

trainer.save_model(args.output_dir) 大哥 这是我训练代码,有空指导一下

HelixPark commented 3 months ago

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"  穗 @.   ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?

报错如下: Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using auto half precision backend Running training Num examples = 379,743 Num Epochs = 1 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 640 Gradient Accumulation steps = 10 Total optimization steps = 593 Number of trainable parameters = 358,072,832 0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(transformer_outputs = self.transformer(

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return self._call_impl(*args, kwargs)
return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args,
kwargs)
return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call ValueError: too many values to unpack (expected 3) return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'

Traceback (most recent call last): ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train File "/home/llm/tlm/pre_train.py", line 264, in too many values to unpack (expected 3) trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(*inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs)outputs = model(inputs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs)return forward_call(args, kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return model_forward(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(
return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args,
kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn(return forward_call(args, kwargs)

File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) return forward_call(*args, **kwargs)ValueError : too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward

query, key, value = mixed_x_layer.split(self.split_size, dim=2)

ValueError: too many values to unpack (expected 3) Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) 0%| | 0/593 [00:00<?, ?it/s]
[2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM

2279072142 commented 1 month ago

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"  穗 @.*   ------------------ 原始邮件 ------------------ 发件人: _@.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINILLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.**>

大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?

报错如下: Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using auto half precision backend Running training Num examples = 379,743 Num Epochs = 1 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 640 Gradient Accumulation steps = 10 Total optimization steps = 593 Number of trainable parameters = 358,072,832 0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(transformer_outputs = self.transformer(

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return self._call_impl(*args, kwargs) return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call ValueError: too many values to unpack (expected 3) return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'

Traceback (most recent call last): ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train File "/home/llm/tlm/pre_train.py", line 264, in too many values to unpack (expected 3) trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(*inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs)outputs = model(inputs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs)return forward_call(args, kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call* return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn(return forward_call(args, kwargs)

File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) return forward_call(*args, **kwargs)ValueError : too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward

query, key, value = mixed_x_layer.split(self.split_size, dim=2)

ValueError: too many values to unpack (expected 3) Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call* return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) 0%| | 0/593 [00:00<?, ?it/s] [2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM

你需要把kv_channels那个变量进行修改,需要满足hidden_size=kv_channels*num_attention_heads

kingpingyue commented 1 month ago

多谢大佬,已经注意到了 

@.***

 

------------------ 原始邮件 ------------------ 发件人: @.***>; 发送时间: 2024年7月15日(星期一) 上午8:52 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINI_LLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16)

我把你的代码,减少了层数,重新训练, 和模型对话模型回复",,,,,,"  穗 @.*   … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年3月25日(星期一) 上午9:23 收件人: "jiahe7ay/MINI_LLM"; 抄送: "Author"; 主题: Re: [jiahe7ay/MINILLM] 大哥 我尝试将这个代码缩小到120m左右模型,为啥训练的时候进行对话回复,,,,这是什么情况 (Issue #16) 不懂你这个啥意思 能具体点吗 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.**_>

大佬,我把config.json里的 "hidden_size": 2048,改成512模型是变小了,但是报错了。请教修改了哪里?

报错如下: Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Dataset({ features: ['input_ids'], num_rows: 379743 }) Dataset({ features: ['input_ids'], num_rows: 8792 }) Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters QWen size: 358.1M parameters Detected kernel version 4.19.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using auto half precision backend Running training Num examples = 379,743 Num Epochs = 1 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 640 Gradient Accumulation steps = 10 Total optimization steps = 593 Number of trainable parameters = 358,072,832 0%| | 0/593 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block(transformer_outputs = self.transformer(

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return self._call_impl(*args, kwargs) return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)return model_forward(*args, **kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call ValueError: too many values to unpack (expected 3) return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)trainer.train( #'model_save/pre/checkpoint-3400'

Traceback (most recent call last): ValueError: File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train File "/home/llm/tlm/pre_train.py", line 264, in too many values to unpack (expected 3) trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(*inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs)outputs = model(inputs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl else self._run_ddp_forward(inputs, kwargs) return forward_call(*args, **kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl else self._run_ddp_forward(*inputs, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs)return forward_call(args, kwargs)

File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward return model_forward(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn(return forward_call(args, kwargs)

File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) return forward_call(args, kwargs)ValueError : too many values to unpack (expected 3) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2)
ValueError: too many values to unpack (expected 3) Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(
inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(
args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl Traceback (most recent call last): File "/home/llm/tlm/pre_train.py", line 264, in return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl trainer.train( #'model_save/pre/checkpoint-3400' File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward return inner_training_loop( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) loss = self.compute_loss(model, inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 1135, in forward transformer_outputs = self.transformer( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 971, in forward outputs = block( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 691, in forward attn_outputs = self.attn( File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/anaconda3/envs/park/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/llm/tlm/qwen/modeling_qwen.py", line 484, in forward query, key, value = mixed_x_layer.split(self.split_size, dim=2) ValueError: too many values to unpack (expected 3) 0%| | 0/593 [00:00<?, ?it/s] [2024-05-08 18:01:42,890] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367112 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367117 closing signal SIGTERM [2024-05-08 18:01:42,891] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 3367119 closing signal SIGTERM

你需要把kv_channels那个变量进行修改,需要满足hidden_size=kv_channels*num_attention_heads

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>