lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.59k stars 4.52k forks source link

leave only 45 conversations in dummy.json result in error #1097

Open luckyfish0826 opened 1 year ago

luckyfish0826 commented 1 year ago

at first we edit the dummy.json file, changed the "my name is Vicuna" as "my name is XXXXX", and keep all the other conversations (total 910) , then trained it, the new model works fine in English output, by failed when we asked it with other languages.

so in order to find out the problem, we made the same change and leave only the 45 conversations about "who are you" (delete other 865 conversations), then trained it. This time we faced below error message:

RuntimeError: The size of tensor a (32768512) must match the size of tensor b (262148096) at non-singleton dimension 0

all the other detail traceback is below. Any one can help?

Not sure whether this belongs to an issue, yet we could not find better place to resolve this problem.


2023-05-08 10:33:13.000 [INFO] [Driver] Traceback (most recent call last): 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/source/FastChat/fastchat/train/train_mem.py"", line 13, in " 2023-05-08 10:33:13.000 [INFO] [Driver] train() 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/source/FastChat/fastchat/train/train.py"", line 245, in train" 2023-05-08 10:33:13.000 [INFO] [Driver] trainer.train() 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/transformers/trainer.py"", line 1662, in train" 2023-05-08 10:33:13.000 [INFO] [Driver] return inner_training_loop( 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/transformers/trainer.py"", line 1996, in _inner_training_loop" 2023-05-08 10:33:13.000 [INFO] [Driver] self.optimizer.step() 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim/lr_scheduler.py"", line 68, in wrapper" 2023-05-08 10:33:13.000 [INFO] [Driver] return wrapped(*args, kwargs)" 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim/optimizer.py"", line 140, in wrapper" 2023-05-08 10:33:13.000 [INFO] [Driver] out = func(*args, *kwargs)" 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/autograd/grad_mode.py"", line 27, in decorate_context" 2023-05-08 10:33:13.000 [INFO] [Driver] return func(args, kwargs)" 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim/adamw.py"", line 162, in step" 2023-05-08 10:33:13.000 [INFO] [Driver] adamw(params_with_grad," 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim/adamw.py"", line 219, in adamw" 2023-05-08 10:33:13.000 [INFO] [Driver] func(params," 2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim/adamw.py"", line 273, in _single_tensor_adamw" 2023-05-08 10:33:13.000 [INFO] [Driver] expavg.mul(beta1).add_(grad, alpha=1 - beta1)" 2023-05-08 10:33:13.000 [INFO] [Driver] RuntimeError: The size of tensor a (32768512) must match the size of tensor b (262148096) at non-singleton dimension 0 2023-05-08 10:33:14.000 [INFO] [Driver] ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/source/FastChat/fastchat/train/train_mem.py:13 in │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 10 from fastchat.train.train import train │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 11 │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 12 if name == ""main"": │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 13 │ train() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 14 │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/source/FastChat/fastchat/train/train.py:245 in train │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 242 │ if list(pathlib.Path(training_args.output_dir).glob(""checkpoint-"" │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 243 │ │ trainer.train(resume_from_checkpoint=True) │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 244 │ else: │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 245 │ │ trainer.train() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 246 │ trainer.save_state() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 247 │ safe_save_model_for_hf_trainer(trainer=trainer, output_dir=trainin │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 248 │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/transformer │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ s/trainer.py:1662 in train │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1659 │ │ inner_training_loop = find_executable_batch_size( │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.a │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1661 │ │ ) │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 1662 │ │ return inner_training_loop( │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1663 │ │ │ args=args, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1665 │ │ │ trial=trial, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/transformer │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ s/trainer.py:1996 in _inner_training_loop │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1993 │ │ │ │ │ │ scale_after = self.scaler.get_scale() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1994 │ │ │ │ │ │ optimizer_was_run = scale_before <= scale_aft │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1995 │ │ │ │ │ else: │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 1996 │ │ │ │ │ │ self.optimizer.step() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1997 │ │ │ │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1998 │ │ │ │ │ if optimizer_was_run and not self.deepspeed: │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 1999 │ │ │ │ │ │ self.lr_scheduler.step() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /lr_scheduler.py:68 in wrapper │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 65 │ │ │ │ instance = instance_ref() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 66 │ │ │ │ instance._step_count += 1 │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 67 │ │ │ │ wrapped = func.get(instance, cls) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 68 │ │ │ │ return wrapped(args, kwargs) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 69 │ │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 70 │ │ │ # Note that the returned function here is no longer a bou │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 71 │ │ │ # so attributes like __func__ and __self__ no longer │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /optimizer.py:140 in wrapper │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 137 │ │ │ │ obj, _ = args │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 138 │ │ │ │ profile_name = ""Optimizer.step#{}.step"".format(obj.__c │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 139 │ │ │ │ with torch.autograd.profiler.record_function(profile_n │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 140 │ │ │ │ │ out = func(args, kwargs) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 141 │ │ │ │ │ obj._optimizer_step_code() │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 142 │ │ │ │ │ return out │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 143 │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/autog │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ rad/grad_mode.py:27 in decorate_context │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 24 │ │ @functools.wraps(func) │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 25 │ │ def decorate_context(*args, *kwargs): │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 26 │ │ │ with self.clone(): │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 27 │ │ │ │ return func(args, *kwargs) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 28 │ │ return cast(F, decorate_context) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 29 │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 30 │ def _wrap_generator(self, func): │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:162 in step │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 159 │ │ │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 160 │ │ │ │ state_steps.append(state['step']) │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 161 │ │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 162 │ │ │ adamw(params_with_grad, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 163 │ │ │ │ grads, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 164 │ │ │ │ exp_avgs, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 165 │ │ │ │ exp_avg_sqs, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:219 in adamw │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 216 │ else: │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 217 │ │ func = _single_tensor_adamw │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 218 │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 219 │ func(params, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 220 │ │ grads, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 221 │ │ exp_avgs, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 222 │ │ exp_avg_sqs, │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:273 in _single_tensoradamw │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 270 │ │ param.mul(1 - lr weight_decay) │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 271 │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 272 │ │ # Decay the first and second moment running average coefficien │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 273 │ │ expavg.mul(beta1).add_(grad, alpha=1 - beta1) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 274 │ │ exp_avgsq.mul(beta2).addcmul_(grad, grad, value=1 - beta2) │" 2023-05-08 10:33:14.000 [INFO] [Driver] │ 275 │ │ │ 2023-05-08 10:33:14.000 [INFO] [Driver] │ 276 │ │ if capturable: │ 2023-05-08 10:33:14.000 [INFO] [Driver] ╰──────────────────────────────────────────────────────────────────────────────╯ 2023-05-08 10:33:14.000 [INFO] [Driver] RuntimeError: The size of tensor a (32768512) must match the size of tensor b 2023-05-08 10:33:14.000 [INFO] [Driver] (262148096) at non-singleton dimension 0

luckyfish0826 commented 1 year ago

update: both pop same error in 0.2.3 and 0.2.5

gxy-gxy commented 1 year ago

Do you have gradient accumulation steps larger than your dataset size?

luckyfish0826 commented 1 year ago

Do you have gradient accumulation steps larger than your dataset size?

not quite sure about this. In my case, I changed nothing but the dummy.json file. Seems there is a minimum conversation count required, after testing, we found it's about 100. really wired.

gxy-gxy commented 1 year ago

oh, I met the same problem before. But I found it's because I use a small dataset and set a big accumulate step larger than dataset size. It become normal after I change the accumulate step. Maybe your conditional is similar. Hope this can inspire you!

luckyfish0826 commented 1 year ago

thank you, I am new to LLM. I basically understand your point, I'll try and see what's coming