Open dimitry12 opened 1 year ago
Good question! Hugging Face has a library, Optimum, that can do optimizations such as pruning and quantization. It seems to me that these kinds of optimizations that require "model surgery" really belong into Optimum, but I'm not aware of any plans to add these particular optimizations. But it's definitely something worth considering (and it could be prototyped in Exporters).
these kinds of optimizations that require "model surgery" really belong into Optimum, but I'm not aware of any plans to add these particular optimizations. But it's definitely something worth considering (and it could be prototyped in Exporters)
Makes sense! I can't commit to a particular timeline but I definitely plan to work in this direction. If that's ok, I will keep this issue open to post updates.
ane_transformers
(https://github.com/apple/ml-ane-transformers and https://machinelearning.apple.com/research/neural-engine-transformers) suggest weight-compatible changes to transformers allowing better mapping of the ops to ANE and thus resulting in significant performance improvement.@hollance do you think these optimizations "belong" in 🤗 Exporters? If yes, how do you envision their implementation: within
CoreMLConfig
abstraction or somewhere else?