Currently we manually delete tensors found on the GPU by writing empty data -- this causes memory access errors GPU-side with 4-bit models. Ideally we would get rid of this and unload models like normal people
Somewhere theres a reference dangling to the model
gc module shows some references, but after pruning those, gc reports same references as a control environment without the bug.
Currently we manually delete tensors found on the GPU by writing empty data -- this causes memory access errors GPU-side with 4-bit models. Ideally we would get rid of this and unload models like normal people