Hi guys! I got the following error when using Unsloth patch 2024.7 to resume training from checkpoint.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
I did not occur this error when using the older version.
Just curious, is it possible to install and use the older version of Unsloth?
Edit
Here is the full error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 trainer_stats = trainer.train("/kaggle/working/outputs/checkpoint-525")
2 # trainer_stats = trainer.train()
File <string>:123, in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
File <string>:422, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
File /opt/conda/lib/python3.10/site-packages/accelerate/optimizer.py:157, in AcceleratedOptimizer.step(self, closure)
154 if self.scaler is not None:
155 self.optimizer.step = self._optimizer_patched_step_method
--> 157 self.scaler.step(self.optimizer, closure)
158 self.scaler.update()
160 if not self._accelerate_step_called:
161 # If the optimizer step was skipped, gradient overflow was detected.
File /opt/conda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:452, in GradScaler.step(self, optimizer, *args, **kwargs)
446 self.unscale_(optimizer)
448 assert (
449 len(optimizer_state["found_inf_per_device"]) > 0
450 ), "No inf checks were recorded for this optimizer."
--> 452 retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
454 optimizer_state["stage"] = OptState.STEPPED
456 return retval
File /opt/conda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:350, in GradScaler._maybe_opt_step(self, optimizer, optimizer_state, *args, **kwargs)
348 retval: Optional[float] = None
349 if not sum(v.item() for v in optimizer_state["found_inf_per_device"].values()):
--> 350 retval = optimizer.step(*args, **kwargs)
351 return retval
File /opt/conda/lib/python3.10/site-packages/accelerate/optimizer.py:212, in patch_optimizer_step.<locals>.patched_step(*args, **kwargs)
210 def patched_step(*args, **kwargs):
211 accelerated_optimizer._accelerate_step_called = True
--> 212 return method(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:75, in LRScheduler.__init__.<locals>.with_counter.<locals>.wrapper(*args, **kwargs)
73 instance._step_count += 1
74 wrapped = func.__get__(instance, cls)
---> 75 return wrapped(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py:385, in Optimizer.profile_hook_step.<locals>.wrapper(*args, **kwargs)
380 else:
381 raise RuntimeError(
382 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
383 )
--> 385 out = func(*args, **kwargs)
386 self._optimizer_step_code()
388 # call optimizer step post hooks
File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/galore_torch/adamw8bit.py:52, in AdamW8bit.step(self, closure)
49 group['weight_decay_saved'] = group['weight_decay']
50 group['weight_decay'] = 0
---> 52 grad = state["projector"].project(p.grad, state["step"])
54 # suboptimal implementation
55 p.saved_data = p.data.clone()
File /opt/conda/lib/python3.10/site-packages/galore_torch/galore_projector.py:22, in GaLoreProjector.project(self, full_rank_grad, iter)
20 if self.ortho_matrix is None or iter % self.update_proj_gap == 0:
21 self.ortho_matrix = self.get_orthogonal_matrix(full_rank_grad, self.rank, type='left')
---> 22 low_rank_grad = torch.matmul(self.ortho_matrix.t(), full_rank_grad)
23 elif self.proj_type == 'reverse_std':
24 if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Hi guys! I got the following error when using Unsloth patch 2024.7 to resume training from checkpoint.
I did not occur this error when using the older version. Just curious, is it possible to install and use the older version of Unsloth?
Edit Here is the full error: