Skip weight splitting for SafeTensors Weights

a-ys commented 2 months ago

Feature request

Neuron 2.18 introduced support for loading weights in SafeTensors format. Due to the nature of SafeTensors, it is no longer essential to split model weights to conserve memory. Weights can be loaded directly from .safetensors checkpoints and then loaded to Neuron cores directly. See here for the relevant code within Transformers-NeuronX. A call to save_pretrained_split or save_split is no longer required when the weights are in SafeTensors format.

I propose that Optimum-Neuron support saving model weights without splitting them when dealing with .safetensors files. The current behavior is to call save_split by default. I propose copying .safetensors weights to checkpoint/ to keep the format consistent with the existing approach.

I tested this format by manually replacing the split model in the checkpoint/ directory with .safetensors weights. Example model:

compiled_model/
├── checkpoint
│   ├── config.json
│   ├── generation_config.json
│   └── model.safetensors
├── compiled
│   ├── 6020096294f4b97c6698.neff
│   └── 863ac837735daedbce8f.neff
├── config.json
├── generation_config.json
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer.model
└── tokenizer_config.json

This test showed that loading safetensors already works, which makes sense because Optimum calls neuronx_class.from_pretrained().

In summary, I am suggesting that

Optimum officially supports this method of model loading
Optimum updates export functionally to match.

Motivation

This change would improve the performance of model loading whether it is being called ahead-of-time or just-in-time during model deployment. Performance of saving the split model is heavily IOPS dependent and is not cloud architecture friendly.

Your contribution

I am happy to help contribute to a PR, if this is not already being worked on.

JingyaHuang commented 2 months ago

Pinging @dacorvo for viz.

dacorvo commented 2 months ago

@a-ys thank you for the issue. It makes totally sense and was actually on my todo-list for this week. I had also planned to keep the checkpoint dir, because:

it is less disruptive to introduce safetensors support that way,
TnX can only load HuggingFace weights for the main branch of ungated repositories.

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

huggingface / optimum-neuron