Capt4in-Levi commented 6 months ago

Describe the bug I'm trying to replicate the sample code given in the DeepSparseSentenceTransformer documentation. I'm facing errors while executing it. It is mostly related to the version compatibility of the modules , however I'm stuck trying to find what exactly the issue is , can you please help with this? Expected behavior To load models DeepSparseSentenceTransformer without any errors Environment Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Linux-5.10.215-203.850.amzn2.x86_64-x86_64-with-glibc2.26
Python version [e.g. 3.8]: 3.10.14
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: deepsparse-1.7.1
ML framework version(s) [e.g. torch 1.7.1]: torch-2.1.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnx-1.14.1 onnxruntime-1.16.3 sparsezoo-1.7.0 sparsezoo-nightly-1.8.0.20240401

CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())
```{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 47185920, 'architecture': 'x86_64', 'available_cores_per_socket': 2, 'available_num_cores': 2, 'available_num_hw_threads': 2, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 1, 'bf16': False, 'cores_per_socket': 2, 'dotprod': False, 'i8mm': False, 'isa': 'avx2', 'num_cores': 2, 'num_hw_threads': 2, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 1, 'vbmi': False, 'vbmi2': False, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz', 'vnni': False, 'zen1': False}

To Reproduce Exact steps to reproduce the behavior: !pip install deepsparse[sentence_transformers] !pip install tf-keras (prompted by deepsparse to install this)


from deepsparse.sentence_transformers import DeepSparseSentenceTransformer
model = DeepSparseSentenceTransformer('neuralmagic/bge-small-en-v1.5-quant', export=False)

Our sentences we like to encode

sentences = ['This framework generates embeddings for each input sentence', 'Sentences are passed as a list of string.', 'The quick brown fox jumps over the lazy dog.']

Sentences are encoded by calling model.encode()

import time st = time.time() embeddings = model.encode(sentences) ed = time.time() print("time taken is : ",ed-st)

Print the embeddings

for sentence, embedding in zip(sentences, embeddings): print("Sentence:", sentence) print("Embedding:", embedding.shape) print("")



**Errors**
RuntimeError: Failed to import optimum.deepsparse.modeling because of the following error (look up to see its traceback):
Failed to import optimum.exporters.onnx.__main__ because of the following error (look up to see its traceback):
cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils' (/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/pytorch_utils.py)

**Additional context**
Add any other context about the problem here. Also include any relevant files.

Capt4in-Levi commented 6 months ago

Sharing the working versions of torch, transformers, optimum and deep sparse would fine too if I'm missing something obvious - @mgoin

dilip467 commented 5 months ago

What is the time taken for one embedding generation?

neuralmagic / deepsparse

Unable to load DeepSparseSentenceTransformer #1649

Our sentences we like to encode

Sentences are encoded by calling model.encode()

Print the embeddings