Closed bcui19 closed 1 year ago
@bcui19 I think we should have a quick sync on what inference workflow we want to encourage here. I think we should refactor these scripts to load HF models only, not Composer checkpoints, and benchmarking should support either either raw HF generate, or the DeepSpeed inference wrapper. I don't think we want to reference any training YAMLs or ComposerMosaicGPT models when we get to inference time.
Basically the export flow I want to encourage is: (Training YAML + Composer ckpt) -> (HF folder with model and tokenizer inside) -> optionally ONNX.
Checkout this JIRA for more details: https://mosaicml.atlassian.net/browse/RESEARCH-589
If you want to add deepspeed install instructions, you could try putting a requirement.txt
here with the pinned version you want, and then adding a comment to use it in the README
Below we include a script, .yamls, and a README for benchmarking deepspeed inference.