Closed harubaru closed 2 years ago
Quantization works with Tensorized models, however Softprompting does not due to FrozenBNBEmbedding
being incompatible with torch.nn.Embeddding
during the resize embedding operation to use the Softprompts.
More work would have to be done on that but that is out of scope for this PR.
From this blog post, a method using a zero-copy strategy was implemented which allows for faster and efficient model loading.
https://medium.com/ibm-data-ai/how-to-load-pytorch-models-340-times-faster-with-ray-8b
Here are some results from running the tensorization code on my crappy laptop: