I have encountered a problem when using QLoRa from the LoRALinear class you have in Torchtune. Apparently, when the ‘quantize_base’ parameter is set to True, the NF4Tensor class is called, which transforms the tensor to 4 bits from the to_nf4 function.
This function set the tensor in bf16 without specifying the device, giving the problem of tensors on two different devices, CPU and GPU. When the tensor is set, by default it is assigned to CPU unless otherwise specified. I got this error trying to do QLoRA on my GPU.
I leave here the original function vs the one I modified to get out of it:
Original function according to torchao.dtypes.nf4tensor:
def to_nf4(tensor, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)
Modified function for dealing with the problem:
def to_nf4(tensor, device, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(device=device, dtype=torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)
I have encountered a problem when using QLoRa from the LoRALinear class you have in Torchtune. Apparently, when the ‘quantize_base’ parameter is set to True, the NF4Tensor class is called, which transforms the tensor to 4 bits from the to_nf4 function.
This function set the tensor in bf16 without specifying the device, giving the problem of tensors on two different devices, CPU and GPU. When the tensor is set, by default it is assigned to CPU unless otherwise specified. I got this error trying to do QLoRA on my GPU.
I leave here the original function vs the one I modified to get out of it:
Original function according to torchao.dtypes.nf4tensor:
def to_nf4(tensor, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)
Modified function for dealing with the problem:
def to_nf4(tensor, device, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(device=device, dtype=torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)