[Feature] Does mmselfsup support Fully Sharded Data Parallel (FSDP) ?

open-mmlab / mmselfsup

OpenMMLab Self-Supervised Learning Toolbox and Benchmark

https://mmselfsup.readthedocs.io/en/latest/

Apache License 2.0

3.18k stars 428 forks source link

[Feature] Does mmselfsup support Fully Sharded Data Parallel (FSDP) ? #656

Closed linmou closed 1 year ago

linmou commented 1 year ago

What is the problem this feature will solve?

Since some transformer structure is too huge to be loaded on a single GPU. Does mmselfsup support API that load such big models with more than one GPUs?

What is the feature?

An user-friendly api to load different layers of a huge model to several devices. Just like huggingface models' model.parallelize()

What alternatives have you considered?

No response

fangyixiao18 commented 1 year ago

FSDP is supported in MMEngine, you can also apply it in MMSelfSup. you can try it according to the docs below https://github.com/open-mmlab/mmengine/blob/main/docs/en/examples/save_gpu_memory.md