[NF4][FSDP2] avoid peaking GPU memory when constructing NF4 tensors

pytorch / ao

Native PyTorch library for quantization and sparsity

https://pytorch.org/ao

BSD 3-Clause "New" or "Revised" License

293 stars 41 forks source link

Open weifengpy opened 1 month ago

weifengpy commented 1 month ago

construct NF4 tensors in chunks and check memory traces: https://github.com/pytorch/ao/pull/196

cpuhrsch commented 1 week ago

The linked PR was merged - is this resolved?