pytorch / ao

PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
1.53k stars 159 forks source link

Error when using to_nf4 function, inside NF4Tensor Class #268

Open FabioDataGeek opened 5 months ago

FabioDataGeek commented 5 months ago

I have encountered a problem when using QLoRa from the LoRALinear class you have in Torchtune. Apparently, when the ‘quantize_base’ parameter is set to True, the NF4Tensor class is called, which transforms the tensor to 4 bits from the to_nf4 function.

This function set the tensor in bf16 without specifying the device, giving the problem of tensors on two different devices, CPU and GPU. When the tensor is set, by default it is assigned to CPU unless otherwise specified. I got this error trying to do QLoRA on my GPU.

I leave here the original function vs the one I modified to get out of it:

Original function according to torchao.dtypes.nf4tensor:

def to_nf4(tensor, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)

Modified function for dealing with the problem:

def to_nf4(tensor, device, block_size: int = 64, scaler_block_size: int = 256): tensor1 = tensor.to(device=device, dtype=torch.bfloat16) return NF4Tensor.from_tensor(tensor1, block_size, scaler_block_size)

cpuhrsch commented 5 months ago

@FabioDataGeek would it work to do to_nf4([...]).to(device)?

cpuhrsch commented 5 months ago

@FabioDataGeek - I also sent a PR to add the functionality you mention: https://github.com/pytorch/ao/pull/324