neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.94k stars 169 forks source link

Unable to load DeepSparseSentenceTransformer #1649

Open Capt4in-Levi opened 1 month ago

Capt4in-Levi commented 1 month ago

Describe the bug I'm trying to replicate the sample code given in the DeepSparseSentenceTransformer documentation. I'm facing errors while executing it. It is mostly related to the version compatibility of the modules , however I'm stuck trying to find what exactly the issue is , can you please help with this? Expected behavior To load models DeepSparseSentenceTransformer without any errors Environment Include all relevant environment information:

  1. OS [e.g. Ubuntu 18.04]: Linux-5.10.215-203.850.amzn2.x86_64-x86_64-with-glibc2.26
  2. Python version [e.g. 3.8]: 3.10.14
  3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: deepsparse-1.7.1
  4. ML framework version(s) [e.g. torch 1.7.1]: torch-2.1.0
  5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnx-1.14.1 onnxruntime-1.16.3 sparsezoo-1.7.0 sparsezoo-nightly-1.8.0.20240401
  6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:
    >>> import deepsparse.cpu
    >>> print(deepsparse.cpu.cpu_architecture())
    ```{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 47185920, 'architecture': 'x86_64', 'available_cores_per_socket': 2, 'available_num_cores': 2, 'available_num_hw_threads': 2, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 1, 'bf16': False, 'cores_per_socket': 2, 'dotprod': False, 'i8mm': False, 'isa': 'avx2', 'num_cores': 2, 'num_hw_threads': 2, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 1, 'vbmi': False, 'vbmi2': False, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz', 'vnni': False, 'zen1': False}

    To Reproduce Exact steps to reproduce the behavior: !pip install deepsparse[sentence_transformers] !pip install tf-keras (prompted by deepsparse to install this)

    
    from deepsparse.sentence_transformers import DeepSparseSentenceTransformer
    model = DeepSparseSentenceTransformer('neuralmagic/bge-small-en-v1.5-quant', export=False)

Our sentences we like to encode

sentences = ['This framework generates embeddings for each input sentence', 'Sentences are passed as a list of string.', 'The quick brown fox jumps over the lazy dog.']

Sentences are encoded by calling model.encode()

import time st = time.time() embeddings = model.encode(sentences) ed = time.time() print("time taken is : ",ed-st)

Print the embeddings

for sentence, embedding in zip(sentences, embeddings): print("Sentence:", sentence) print("Embedding:", embedding.shape) print("")



**Errors**
RuntimeError: Failed to import optimum.deepsparse.modeling because of the following error (look up to see its traceback):
Failed to import optimum.exporters.onnx.__main__ because of the following error (look up to see its traceback):
cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/pytorch_utils.py)

**Additional context**
Add any other context about the problem here. Also include any relevant files.
Capt4in-Levi commented 1 month ago

Sharing the working versions of torch, transformers, optimum and deep sparse would fine too if I'm missing something obvious - @mgoin

dilip467 commented 3 weeks ago

What is the time taken for one embedding generation?