iwiwi / epochraft-hf-fsdp

Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP
MIT License
12 stars 5 forks source link

Add CPU offloading option #3

Closed iwiwi closed 1 year ago

iwiwi commented 1 year ago

1

Rumor says that we can train Llama-2 70B with this option even if we are using 40GB GPUs.

iwiwi commented 1 year ago

An interesting finding was that model.device becomes "cpu" when using CPU offloading. We needed some fix for that behavior.