Open a-ys opened 2 months ago
Pinging @dacorvo for viz.
@a-ys thank you for the issue. It makes totally sense and was actually on my todo-list for this week. I had also planned to keep the checkpoint dir, because:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
Feature request
Neuron 2.18 introduced support for loading weights in SafeTensors format. Due to the nature of SafeTensors, it is no longer essential to split model weights to conserve memory. Weights can be loaded directly from .safetensors checkpoints and then loaded to Neuron cores directly. See here for the relevant code within Transformers-NeuronX. A call to save_pretrained_split or save_split is no longer required when the weights are in SafeTensors format.
I propose that Optimum-Neuron support saving model weights without splitting them when dealing with .safetensors files. The current behavior is to call save_split by default. I propose copying .safetensors weights to checkpoint/ to keep the format consistent with the existing approach.
I tested this format by manually replacing the split model in the
checkpoint/
directory with .safetensors weights. Example model:This test showed that loading safetensors already works, which makes sense because Optimum calls neuronx_class.from_pretrained().
In summary, I am suggesting that
Motivation
This change would improve the performance of model loading whether it is being called ahead-of-time or just-in-time during model deployment. Performance of saving the split model is heavily IOPS dependent and is not cloud architecture friendly.
Your contribution
I am happy to help contribute to a PR, if this is not already being worked on.