xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.78k stars 131 forks source link

Save and load dynamically quantized model #99

Open roman-dobrov opened 7 months ago

roman-dobrov commented 7 months ago

Hello! First of all, great work on instructor.

I'd like to load a quantized model to avoid CPU/memory spikes on my script startup which happen during quantization itself.

I tried static quantization first but it is not supported for SentenceTransformers for float16 or qint8. For dynamic quantization I get the following errors when trying to load a saved state_dict:

RuntimeError: Error(s) in loading state_dict for INSTRUCTOR:
        Unexpected key(s) in state_dict: "2.linear.scale", "2.linear.zero_point", "2.linear._packed_params.dtype", "2.linear._packed_params._packed_params".

I tried two save methods: direct torch.save(model.state_dict()) and saving traced version with torch.jit.trace but both result in the same error. So, is there a way to save/load a quantized model?

hongjin-su commented 6 months ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

The following works for me:

import torch
from InstructorEmbedding import INSTRUCTOR
from torch.nn import Embedding, Linear
from torch.quantization import quantize_dynamic

model = INSTRUCTOR('hkunlp/instructor-large',device='cpu')
qconfig_dict = {Embedding : torch.ao.quantization.qconfig.float_qparams_weight_only_qconfig, Linear: torch.ao.quantization.qconfig.default_dynamic_qconfig}

qmodel = quantize_dynamic(model, qconfig_dict)

Hope this helps!

roman-dobrov commented 6 months ago

@hongjin-su Thank you for your response! Does loading of quantized model work for you?

hongjin-su commented 6 months ago

Yeah, this seems to work:

>>> import torch
>>> a = torch.load('state.pt')
/home/linuxbrew/.linuxbrew/Cellar/python@3.11/3.11.6/lib/python3.11/site-packages/torch/_utils.py:376: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
roman-dobrov commented 6 months ago

@hongjin-su And how do you convert it to the actual model? torch.load returns OrderedDict which is a state dict. I get the aforementioned error on trying to load_state_dict before actually using the model