Open felipemello1 opened 16 hours ago
1 is a known issue. You can see my view here https://github.com/pytorch/ao/issues/959#issuecomment-2378225308. I will look into torch.optim.Optimizer
base class to see what could go wrong if I make CPUOffloadOptimizer
inherit it. For example, on the top of my head, CPUOffloadOptimizer
will not have self.state
.
In the meantime, CPUOffloadOptimizer
requires setting LR manually https://github.com/pytorch/ao/pull/584#issuecomment-2364915318
For 2, it's an oversight from my part. We can simply add a requires grad check here. Will push a fix https://github.com/pytorch/ao/blob/27619174ed5a372a1ce96a0615089c5a08c88566/torchao/prototype/low_bit_optim/cpu_offload.py#L68-L77
hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune:
When using a LR scheduler i got. Maybe there is a way to inherit the optimizer class?
When passing model.params() i got the error below. I imagine that a simple fix is to keep only params that require grad, like adamw implementation oes
cc: @gau-nernst