pytorch / ao

Native PyTorch library for quantization and sparsity
https://pytorch.org/ao
BSD 3-Clause "New" or "Revised" License
293 stars 41 forks source link

[NF4][FSDP2] avoid peaking GPU memory when constructing NF4 tensors #204

Open weifengpy opened 1 month ago

weifengpy commented 1 month ago

construct NF4 tensors in chunks and check memory traces: https://github.com/pytorch/ao/pull/196

cpuhrsch commented 1 week ago

The linked PR was merged - is this resolved?