microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.2k stars 4.08k forks source link

[REQUEST] Mixed dtype for model parameters #4689

Open jedyang97 opened 11 months ago

jedyang97 commented 11 months ago

Is your feature request related to a problem? Please describe. Is it possible to support model of irregular dtypes? For example, a large multimodal LLM might have a vision encoder that is of dtype=float32 and its LLM part in dtype=bfloat16. This will be particularly helpful since some customized vision models (e.g., MinkowskiEngine) don't support float16/bfloat16.

Describe the solution you'd like Have a flag (e.g., dont_change_dtype) in DeepSpeedEngine to allow loading a nn.Module model without modifying its dtypes of various parameters (e.g., some params might be float32, while some are bfloat16)

ZCMax commented 3 months ago

Hello, did you find any solutions for this?