huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.84k stars 471 forks source link

[BUG] "output tensor must have the same type as input tensor" error when i tried to finetune localy #669

Closed hichambht32 closed 2 months ago

hichambht32 commented 4 months ago

Hello everyone, i have 4 gpus RTX 3080 with 10 GiB each and im trying to fine tune mistral 7B v2.0 localy, i tried to optimize as much as i can...(Accelerate with DeepSpeed, 4bit quantization, LoRa and all that stuff) but now i am getting this input and output tensors are not of the same type error my csv dataset has one column called Text, which includes question-answer pairs Can you suggest a fix ? this is the error : "Loading extension module cpu_adam... Time to load cpu_adam op: 2.2301530838012695 seconds Parameter Offload: Total persistent parameters: 20189184 in 417 params INFO | 2024-06-06 17:33:31 | autotrain.trainers.common:on_train_begin:231 - Starting to train... 0%| | 0/20 [00:00<?, ?it/s] (myenv) rag@PC-RAG:~/finetune$ ERROR | 2024-06-06 17:34:26 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last): File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper return func(*args, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 28, in train train_sft(config) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 98, in train trainer.train() File "/home/rag/finetune/myenv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step self.accelerator.backward(loss) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2007, in backward self.deepspeed_engine_wrapped.backward(loss, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 175, in backward self.engine.step() File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2169, in step self._take_model_step(lr_kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2075, in _take_model_step self.optimizer.step() File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2060, in step self._post_step(timer_names) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 1986, in _post_step self.persistent_parameters[0].all_gather(self.persistent_parameters) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1121, in all_gather return self._all_gather(param_list, async_op=async_op, hierarchy=hierarchy) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, *kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1465, in _all_gather self._allgather_params_coalesced(all_gather_nonquantize_list, hierarchy, quantize=False) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1769, in _allgather_params_coalesced h = dist.all_gather_into_tensor(allgather_params[param_idx], File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper return func(args, kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn return fn(*args, *kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 213, in all_gather_into_tensor return self.all_gather_function(output_tensor=output_tensor, File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper return func(args, **kwargs) File "/home/rag/finetune/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2948, in all_gather_into_tensor work = group._allgather_base(output_tensor, input_tensor, opts) TypeError: output tensor must have the same type as input tensor

ERROR | 2024-06-06 17:34:26 | autotrain.trainers.common:wrapper:121 - output tensor must have the same type as input tensor"

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 20 days since being marked as stale.