michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
971 stars 72 forks source link

Error in offline mode with `trust_remote code`: SFR-Embedding-Mistral and nomic does not work without `einops` #185

Closed prasannakrish97 closed 1 month ago

prasannakrish97 commented 3 months ago

Model description

You have mentioned that sfr-embedding model is supported along with all other huggingface embedding models (ref.nomic). However, both are not working : infinity | ERROR 2024-03-21 14:35:59,554 infinity_emb ERROR: acceleration.py:21 infinity | BetterTransformer is not available for model. The infinity | model type mistral is not yet supported to be used infinity | with BetterTransformer. Feel free to open an issue infinity | at https://github.com/huggingface/optimum/issues if infinity | you would like this model type to be supported. infinity | Currently supported models are: dict_keys(['albert', infinity | 'bark', 'bart', 'bert', 'bert-generation', infinity | 'blenderbot', 'bloom', 'camembert', 'blip-2', infinity | 'clip', 'codegen', 'data2vec-text', 'deit', infinity | 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', infinity | 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', infinity | 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', infinity | 'pegasus', 'rembert', 'prophetnet', 'roberta', infinity | 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', infinity | 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', infinity | 'xlm-roberta', 'yolos']).. Continue without infinity | bettertransformer modeling code. infinity | Traceback (most recent call last): infinity | File infinity | "/app/infinity_emb/transformer/acceleration.py", infinity | line 19, in to_bettertransformer infinity | model = BetterTransformer.transform(model) infinity | File "/usr/lib/python3.10/contextlib.py", line 79, infinity | in inner infinity | return func(*args, **kwds) infinity | File infinity | "/app/.venv/lib/python3.10/site-packages/optimum/bet infinity | tertransformer/transformation.py", line 234, in infinity | transform infinity | raise NotImplementedError( infinity | NotImplementedError: The model type mistral is not infinity | yet supported to be used with BetterTransformer.

Open source status

Provide useful links for the implementation

No response

michaelfeil commented 3 months ago

Thanks for opening the issue. Did you really try to get nomic running?

I would not be concerned about the stacktrace of

 infinity | NotImplementedError: The model type mistral is not

Its just a info warning, that says that the optimum package already uses a better attention implementation for mistral, and no better one is available.

nomic

python3 -m venv venv
source ./venv/bin/activate
pip install infinity_emb[all]
pip install einops # einops is a package required just by the custom code of nomic.
infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5
(.venv) (base) michael@michael-laptop:~/infinity/libs/infinity_emb$ infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5
INFO:     Started server process [426215]
INFO:     Waiting for application startup.
INFO     2024-03-30 09:31:45,673 infinity_emb INFO: model=`nomic-ai/nomic-embed-text-v1.5` selected, using engine=`torch` and device=`None`        select_model.py:54
INFO     2024-03-30 09:31:46,118 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer:                      SentenceTransformer.py:107
         nomic-ai/nomic-embed-text-v1.5                                                                                                                              
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 547M/547M [00:25<00:00, 21.9MB/s]
WARNING  2024-03-30 09:32:14,036                                                                                                        modeling_hf_nomic_bert.py:357
         transformers_modules.nomic-ai.nomic-embed-text-v1-unsupervised.3916676c856f1e25a4cc7a4e0ac740ea6ca9723a.modeling_hf_nomic_bert                              
         WARNING: <All keys matched successfully>                                                                                                                    
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1.19k/1.19k [00:00<00:00, 8.19MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.87MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 2.94MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 695/695 [00:00<00:00, 5.14MB/s]
1_Pooling/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 286/286 [00:00<00:00, 1.99MB/s]
INFO     2024-03-30 09:32:16,061 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda                             SentenceTransformer.py:213
INFO     2024-03-30 09:32:16,502 infinity_emb INFO: Adding optimizations via Huggingface optimum.                                                  acceleration.py:17
ERROR    2024-03-30 09:32:16,503 infinity_emb ERROR: BetterTransformer is not available for model. The model type nomic_bert is not yet supported  acceleration.py:21
         to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this                       
         model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot',                   
         'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj',                       
         'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet',                           
         'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos'])..                   
         Continue without bettertransformer modeling code.                                                                                                           
         Traceback (most recent call last):                                                                                                                          
           File "/home/michael/infinity/libs/infinity_emb/infinity_emb/transformer/acceleration.py", line 19, in to_bettertransformer                                
             model = BetterTransformer.transform(model)                                                                                                              
           File "/usr/lib/python3.10/contextlib.py", line 79, in inner                                                                                               
             return func(*args, **kwds)                                                                                                                              
           File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/optimum/bettertransformer/transformation.py", line                      
         234, in transform                                                                                                                                           
             raise NotImplementedError(                                                                                                                              
         NotImplementedError: The model type nomic_bert is not yet supported to be used with BetterTransformer. Feel free to open an issue at                        
         https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are:                            
         dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen',                            
         'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',                             
         'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter',                         
         'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).                                                                   
INFO     2024-03-30 09:32:16,510 infinity_emb INFO: Switching to half() precision (cuda: fp16).                                            sentence_transformer.py:73
INFO     2024-03-30 09:32:17,047 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=1                                select_model.py:77
                 5.65     ms tokenization                                                                                                                            
                 13.25    ms inference                                                                                                                               
                 0.26     ms post-processing                                                                                                                         
                 19.16    ms total                                                                                                                                   
         embeddings/sec: 1670.14                                                                                                                                     
INFO     2024-03-30 09:32:18,570 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=512                              select_model.py:83
                 14.14    ms tokenization                                                                                                                            
                 13.47    ms inference                                                                                                                               
                 726.95   ms post-processing                                                                                                                         
                 754.57   ms total                                                                                                                                   
         embeddings/sec: 42.41                                                                                                                                       
INFO     2024-03-30 09:32:18,572 infinity_emb INFO: model warmed up, between 42.41-1670.14 embeddings/sec at batch_size=32                         select_model.py:84
INFO     2024-03-30 09:32:18,574 infinity_emb INFO: creating batching engine                                                                     batch_handler.py:392
INFO     2024-03-30 09:32:18,575 infinity_emb INFO: ready to batch requests.                                                                     batch_handler.py:249
INFO     2024-03-30 09:32:18,577 infinity_emb INFO:                                                                                             infinity_server.py:64

         ♾️  Infinity - Embedding Inference Server                                                                                                                    
         MIT License; Copyright (c) 2023 Michael Feil                                                                                                                
         Version 0.0.31                                                                                                                                              

         Open the Docs via Swagger UI:                                                                                                                               
         http://0.0.0.0:7997/docs                                                                                                                                    

         Access model via 'GET':                                                                                                                                     
         curl http://0.0.0.0:7997/models                                                                                                                             

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)

Mistral

@prasannakrish97 Can you try running the above commands and post it here?

prasannakrish97 commented 3 months ago

Hello We are using the docker image 0.0.31. We install our models (nomic-embed-text-v1, and nomic-embed-text-v1.5) locally (/data), no internet access. einops is 0.7.0. We got an error after the aforementioned warning : (same error for both) For SFR-Embedding-Mistral, it’s working as intended, past the waning to ignore.

However, we're encountering the following problem for nomic (Nota Bene : Would like to mention that the same model nomic works well with Text Embedding Inference locally but not with infinity ) :

infinity-nomic_1  | INFO:     Started server process [1]
infinity-nomic_1  | INFO:     Waiting for application startup.
infinity-nomic_1  | INFO     2024-04-05 08:52:36,666 infinity_emb INFO:           select_model.py:54
infinity-nomic_1  |          model=`/data` selected, using engine=`torch` and
infinity-nomic_1  |          device=`None`
infinity-nomic_1  | INFO     2024-04-05 08:52:36,678                      SentenceTransformer.py:107
infinity-nomic_1  |          sentence_transformers.SentenceTransformer
infinity-nomic_1  |          INFO: Load pretrained SentenceTransformer:
infinity-nomic_1  |          /data
infinity-nomic_1  | WARNING  2024-04-05 08:52:42,469                   modeling_hf_nomic_bert.py:357
infinity-nomic_1  |          transformers_modules.data.modeling_hf_nom
infinity-nomic_1  |          ic_bert WARNING: <All keys matched
infinity-nomic_1  |          successfully>
infinity-nomic_1  | INFO     2024-04-05 08:52:42,536                      SentenceTransformer.py:213
infinity-nomic_1  |          sentence_transformers.SentenceTransformer
infinity-nomic_1  |          INFO: Use pytorch device_name: cpu
infinity-nomic_1  | INFO     2024-04-05 08:52:42,560 infinity_emb INFO: Adding    acceleration.py:17
infinity-nomic_1  |          optimizations via Huggingface optimum.
infinity-nomic_1  | ERROR    2024-04-05 08:52:42,562 infinity_emb ERROR:          acceleration.py:21
infinity-nomic_1  |          BetterTransformer is not available for model. The
infinity-nomic_1  |          model type nomic_bert is not yet supported to be
infinity-nomic_1  |          used with BetterTransformer. Feel free to open an
infinity-nomic_1  |          issue at
infinity-nomic_1  |          https://github.com/huggingface/optimum/issues if you
infinity-nomic_1  |          would like this model type to be supported.
infinity-nomic_1  |          Currently supported models are: dict_keys(['albert',
infinity-nomic_1  |          'bark', 'bart', 'bert', 'bert-generation',
infinity-nomic_1  |          'blenderbot', 'bloom', 'camembert', 'blip-2',
infinity-nomic_1  |          'clip', 'codegen', 'data2vec-text', 'deit',
infinity-nomic_1  |          'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2',
infinity-nomic_1  |          'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',
infinity-nomic_1  |          'm2m_100', 'marian', 'markuplm', 'mbart', 'opt',
infinity-nomic_1  |          'pegasus', 'rembert', 'prophetnet', 'roberta',
infinity-nomic_1  |          'roc_bert', 'roformer', 'splinter', 'tapas', 't5',
infinity-nomic_1  |          'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2',
infinity-nomic_1  |          'xlm-roberta', 'yolos']).. Continue without
infinity-nomic_1  |          bettertransformer modeling code.
infinity-nomic_1  |          Traceback (most recent call last):
infinity-nomic_1  |            File
infinity-nomic_1  |          "/app/infinity_emb/transformer/acceleration.py",
infinity-nomic_1  |          line 19, in to_bettertransformer
infinity-nomic_1  |              model = BetterTransformer.transform(model)
infinity-nomic_1  |            File "/usr/lib/python3.10/contextlib.py", line 79,
infinity-nomic_1  |          in inner
infinity-nomic_1  |              return func(*args, **kwds)
infinity-nomic_1  |            File
infinity-nomic_1  |          "/app/.venv/lib/python3.10/site-packages/optimum/bet
infinity-nomic_1  |          tertransformer/transformation.py", line 234, in
infinity-nomic_1  |          transform
infinity-nomic_1  |              raise NotImplementedError(
infinity-nomic_1  |          NotImplementedError: The model type nomic_bert is
infinity-nomic_1  |          not yet supported to be used with BetterTransformer.
infinity-nomic_1  |          Feel free to open an issue at
infinity-nomic_1  |          https://github.com/huggingface/optimum/issues if you
infinity-nomic_1  |          would like this model type to be supported.
infinity-nomic_1  |          Currently supported models are: dict_keys(['albert',
infinity-nomic_1  |          'bark', 'bart', 'bert', 'bert-generation',
infinity-nomic_1  |          'blenderbot', 'bloom', 'camembert', 'blip-2',
infinity-nomic_1  |          'clip', 'codegen', 'data2vec-text', 'deit',
infinity-nomic_1  |          'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2',
infinity-nomic_1  |          'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',
infinity-nomic_1  |          'm2m_100', 'marian', 'markuplm', 'mbart', 'opt',
infinity-nomic_1  |          'pegasus', 'rembert', 'prophetnet', 'roberta',
infinity-nomic_1  |          'roc_bert', 'roformer', 'splinter', 'tapas', 't5',
infinity-nomic_1  |          'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2',
infinity-nomic_1  |          'xlm-roberta', 'yolos']).
infinity-nomic_1  | ERROR:    Traceback (most recent call last):
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 677, in lifespan
infinity-nomic_1  |     async with self.lifespan_context(app) as maybe_state:
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 566, in __aenter__
infinity-nomic_1  |     await self._router.startup()
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 654, in startup
infinity-nomic_1  |     await handler()
infinity-nomic_1  |   File "/app/infinity_emb/infinity_server.py", line 62, in _startup
infinity-nomic_1  |     app.model = AsyncEmbeddingEngine.from_args(engine_args)
infinity-nomic_1  |   File "/app/infinity_emb/engine.py", line 49, in from_args
infinity-nomic_1  |     engine = cls(**asdict(engine_args), _show_deprecation_warning=False)
infinity-nomic_1  |   File "/app/infinity_emb/engine.py", line 40, in __init__
infinity-nomic_1  |     self._model, self._min_inference_t, self._max_inference_t = select_model(
infinity-nomic_1  |   File "/app/infinity_emb/inference/select_model.py", line 68, in select_model
infinity-nomic_1  |     loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=1)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/abstract.py", line 55, in warmup
infinity-nomic_1  |     return run_warmup(self, inp)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/abstract.py", line 105, in run_warmup
infinity-nomic_1  |     embed = model.encode_core(feat)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 97, in encode_core
infinity-nomic_1  |     out_features = self.forward(features)["sentence_embedding"]
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
infinity-nomic_1  |     input = module(input)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
infinity-nomic_1  |     return self._call_impl(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
infinity-nomic_1  |     return forward_call(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 98, in forward
infinity-nomic_1  |     output_states = self.auto_model(**trans_features, return_dict=False)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
infinity-nomic_1  |     return self._call_impl(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
infinity-nomic_1  |     return forward_call(*args, **kwargs)
infinity-nomic_1  | TypeError: NomicBertModel.forward() got an unexpected keyword argument 'return_dict'
michaelfeil commented 2 months ago

Okay, I have shown above that it is possible to run infinity with nomic. Therefore I'll do the following:

  1. Try running again with this commands. Also delete all of your preexisting huggingface_hub modules and set a explicit commit. nomic runs with custom modeling code, so be aware that not pinning a specific version will lead to the fact that you execute whatever code from them in any future version.

    python3 -m venv venv
    source ./venv/bin/activate
    pip install infinity_emb[all]
    pip install einops # einops is a package required just by the custom code of nomic.
    infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5 --revision some_specfic_revision
  2. 195 I'll plan to make it easier to "bake in a model in a dockerfile" - to many people have had issues with that, and it requires to much knowledge into compatible huggingface_hub / sentence_transformers versions, cache path etc. Perhaps give it a try once its merged.

michaelfeil commented 2 months ago

https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/discussions/16#6616ca28401ac37f878f4701