[BUG] Varying dimensions for distiluse-base-multilingual-cased-v1

Jon-AtAWS commented 2 months ago

What is the bug? Frequent Torch errors in the OpenSearch log during bulk upload while using distiluse-base-multilingual-cased-v1

The weird thing is that I'm seeing different dimensions in the error

opensearch-ml-node     | RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (518) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (518) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (580) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (580) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (516) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (516) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (515) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (515) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     | RuntimeError: The size of tensor a (539) must match the size of tensor b (512) at non-singleton dimension 1

How can one reproduce the bug? Steps to reproduce the behavior:

I'm running data from the Amazon Product Q&A data set through the _bulk API. Other models work fine so there's an existence proof that the code is not at fault.

Copy-pasting a bunch of code here.

    settings['settings']['index']['knn'] = True
    settings['settings']['default_pipeline'] = PIPELINE_NAME

{
      "description": "Pipeline for processing chunks",
      "processors" : [
        {
          "text_embedding": {
            "model_id": f'{self.model_id()}',
            "field_map": {'chunk': 'chunk_embedding'}
          }
        }
      ]
    }

"chunk_embedding": {
      "type": "knn_vector",
      "dimension": 512,
      "method": {
        "name": "hnsw",
        "space_type": "l2",
        "engine": "nmslib",
        "parameters": {
          "ef_construction": 128,
          "m": 24
        }
      }
    }

What is your host/environment?

OS: mac
Version: Ventura 13.6.6 (22G630)
Docker

Sample error:

opensearch-ml-node     | [2024-04-07T17:29:25,378][ERROR][o.o.m.e.a.DLModel        ] [opensearch-ml-node] Failed to inference TEXT_EMBEDDING model: wZmauY4BpvFYTNZ-aTWh
opensearch-ml-node     | java.security.PrivilegedActionException: null
opensearch-ml-node     |    at java.base/java.security.AccessController.doPrivileged(AccessController.java:575) ~[?:?]
opensearch-ml-node     |    at org.opensearch.ml.engine.algorithms.DLModel.predict(DLModel.java:81) [opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml-node     |    at org.opensearch.ml.task.MLPredictTaskRunner.lambda$predict$5(MLPredictTaskRunner.java:222) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml-node     |    at org.opensearch.ml.model.MLModelManager.trackPredictDuration(MLModelManager.java:1805) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml-node     |    at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:222) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml-node     |    at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executeTask$4(MLPredictTaskRunner.java:194) [opensearch-ml-2.12.0.0.jar:2.12.0.0]
opensearch-ml-node     |    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854) [opensearch-2.12.0.jar:2.12.0]
opensearch-ml-node     |    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-ml-node     |    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-ml-node     |    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-ml-node     | Caused by: ai.djl.translate.TranslateException: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
opensearch-ml-node     | Traceback of TorchScript, serialized code (most recent call last):
opensearch-ml-node     |   File "code/__torch__/sentence_transformers/SentenceTransformer.py", line 14, in forward
opensearch-ml-node     |     input_ids = input["input_ids"]
opensearch-ml-node     |     mask = input["attention_mask"]
opensearch-ml-node     |     _2 = (_0).forward(input_ids, mask, )
opensearch-ml-node     |           ~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     _3 = {"input_ids": input_ids, "attention_mask": mask, "token_embeddings": _2, "sentence_embedding": (_1).forward(_2, )}
opensearch-ml-node     |     return _3
opensearch-ml-node     |   File "code/__torch__/sentence_transformers/models/Transformer.py", line 11, in forward
opensearch-ml-node     |     mask: Tensor) -> Tensor:
opensearch-ml-node     |     auto_model = self.auto_model
opensearch-ml-node     |     _0 = (auto_model).forward(input_ids, mask, )
opensearch-ml-node     |           ~~~~~~~~~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     return _0
opensearch-ml-node     |   File "code/__torch__/transformers/models/distilbert/modeling_distilbert.py", line 13, in forward
opensearch-ml-node     |     transformer = self.transformer
opensearch-ml-node     |     embeddings = self.embeddings
opensearch-ml-node     |     _0 = (transformer).forward((embeddings).forward(input_ids, ), mask, )
opensearch-ml-node     |                                 ~~~~~~~~~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     return _0
opensearch-ml-node     | class Embeddings(Module):
opensearch-ml-node     |   File "code/__torch__/transformers/models/distilbert/modeling_distilbert.py", line 38, in forward
opensearch-ml-node     |     _3 = (word_embeddings).forward(input_ids, )
opensearch-ml-node     |     _4 = (position_embeddings).forward(input, )
opensearch-ml-node     |     input0 = torch.add(_3, _4)
opensearch-ml-node     |              ~~~~~~~~~ <--- HERE
opensearch-ml-node     |     _5 = (dropout).forward((LayerNorm).forward(input0, ), )
opensearch-ml-node     |     return _5
opensearch-ml-node     |
opensearch-ml-node     | Traceback of TorchScript, original code (most recent call last):
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/transformers/models/distilbert/modeling_distilbert.py(130): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/transformers/models/distilbert/modeling_distilbert.py(578): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /usr/local/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py(66): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/container.py(204): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/jit/_trace.py(976): trace_module
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/jit/_trace.py(759): trace
opensearch-ml-node     | /Volumes/workplace/opensearch-py-ml/src/opensearch-py-ml/opensearch_py_ml/ml_models/sentencetransformermodel.py(778): save_as_pt
opensearch-ml-node     | /Volumes/workplace/opensearch-py-ml/src/opensearch-py-ml/test.py(34): <module>
opensearch-ml-node     | RuntimeError: The size of tensor a (518) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     |
opensearch-ml-node     |    at ai.djl.inference.Predictor.batchPredict(Predictor.java:189) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.inference.Predictor.predict(Predictor.java:126) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at org.opensearch.ml.engine.algorithms.TextEmbeddingModel.predict(TextEmbeddingModel.java:33) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml-node     |    at org.opensearch.ml.engine.algorithms.DLModel.lambda$predict$0(DLModel.java:86) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml-node     |    at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
opensearch-ml-node     |    ... 9 more
opensearch-ml-node     | Caused by: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
opensearch-ml-node     | Traceback of TorchScript, serialized code (most recent call last):
opensearch-ml-node     |   File "code/__torch__/sentence_transformers/SentenceTransformer.py", line 14, in forward
opensearch-ml-node     |     input_ids = input["input_ids"]
opensearch-ml-node     |     mask = input["attention_mask"]
opensearch-ml-node     |     _2 = (_0).forward(input_ids, mask, )
opensearch-ml-node     |           ~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     _3 = {"input_ids": input_ids, "attention_mask": mask, "token_embeddings": _2, "sentence_embedding": (_1).forward(_2, )}
opensearch-ml-node     |     return _3
opensearch-ml-node     |   File "code/__torch__/sentence_transformers/models/Transformer.py", line 11, in forward
opensearch-ml-node     |     mask: Tensor) -> Tensor:
opensearch-ml-node     |     auto_model = self.auto_model
opensearch-ml-node     |     _0 = (auto_model).forward(input_ids, mask, )
opensearch-ml-node     |           ~~~~~~~~~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     return _0
opensearch-ml-node     |   File "code/__torch__/transformers/models/distilbert/modeling_distilbert.py", line 13, in forward
opensearch-ml-node     |     transformer = self.transformer
opensearch-ml-node     |     embeddings = self.embeddings
opensearch-ml-node     |     _0 = (transformer).forward((embeddings).forward(input_ids, ), mask, )
opensearch-ml-node     |                                 ~~~~~~~~~~~~~~~~~~~ <--- HERE
opensearch-ml-node     |     return _0
opensearch-ml-node     | class Embeddings(Module):
opensearch-ml-node     |   File "code/__torch__/transformers/models/distilbert/modeling_distilbert.py", line 38, in forward
opensearch-ml-node     |     _3 = (word_embeddings).forward(input_ids, )
opensearch-ml-node     |     _4 = (position_embeddings).forward(input, )
opensearch-ml-node     |     input0 = torch.add(_3, _4)
opensearch-ml-node     |              ~~~~~~~~~ <--- HERE
opensearch-ml-node     |     _5 = (dropout).forward((LayerNorm).forward(input0, ), )
opensearch-ml-node     |     return _5
opensearch-ml-node     |
opensearch-ml-node     | Traceback of TorchScript, original code (most recent call last):
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/transformers/models/distilbert/modeling_distilbert.py(130): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/transformers/models/distilbert/modeling_distilbert.py(578): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /usr/local/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py(66): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/container.py(204): forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1182): _slow_forward
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py(1194): _call_impl
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/jit/_trace.py(976): trace_module
opensearch-ml-node     | /Users/dhrubo/Library/Python/3.9/lib/python/site-packages/torch/jit/_trace.py(759): trace
opensearch-ml-node     | /Volumes/workplace/opensearch-py-ml/src/opensearch-py-ml/opensearch_py_ml/ml_models/sentencetransformermodel.py(778): save_as_pt
opensearch-ml-node     | /Volumes/workplace/opensearch-py-ml/src/opensearch-py-ml/test.py(34): <module>
opensearch-ml-node     | RuntimeError: The size of tensor a (518) must match the size of tensor b (512) at non-singleton dimension 1
opensearch-ml-node     |
opensearch-ml-node     |    at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method) ~[pytorch-engine-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:53) ~[pytorch-engine-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:154) ~[pytorch-engine-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.nn.Block.forward(Block.java:127) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.inference.Predictor.predictInternal(Predictor.java:140) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.inference.Predictor.batchPredict(Predictor.java:180) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at ai.djl.inference.Predictor.predict(Predictor.java:126) ~[api-0.21.0.jar:?]
opensearch-ml-node     |    at org.opensearch.ml.engine.algorithms.TextEmbeddingModel.predict(TextEmbeddingModel.java:33) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml-node     |    at org.opensearch.ml.engine.algorithms.DLModel.lambda$predict$0(DLModel.java:86) ~[opensearch-ml-algorithms-2.12.0.0.jar:?]
opensearch-ml-node     |    at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
opensearch-ml-node     |    ... 9 more

navneet1v commented 2 months ago

@Jon-AtAWS issue seems to be with ML Commons plugin. I cannot transfer the issue to MLCommons. can you please create a issue with ML Commons.

Jon-AtAWS commented 2 months ago

Actually, I maybe figured it out. I got confused by the "512" which was the dimensions of the model. Actually, I believe the error is on the size of the chunk.

doc['chunk'] = ' '.join(doc['chunk'].split()[:500])

And I'm not seeing the problem any more

Jon-AtAWS commented 2 months ago

We can make this an enhancement request instead for a better error.

xinyual commented 2 months ago

@Jon-AtAWS Hi, could you provide your request body of registering model and creating pipeline so we can try to reproduce? Because the pretrained model is with the truncation itself so the size should not be the problem.

Jon-AtAWS commented 2 months ago

Hi @xinyual,

I think the model was: "huggingface/sentence-transformers/distiluse-base-multilingual-cased-v1". It might have been "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b" (in some runs, it was.

  def _make_register_model_call(self, model_name):
    data={
        "name": model_name,
        "version": "1.0.1",
        "model_format": "TORCH_SCRIPT",
        "model_group_id": self.model_group_id
    }
    resp = requests.post('https://localhost:9200/_plugins/_ml/models/_register',
                          data=json.dumps(data),
                          auth=(self.admin_user, self.admin_password),
                          verify=False,
                          headers={"Content-Type": "application/json"})
    if resp.status_code >= 300:
      logging.error(f'_register call for {model_name} returned bad status {resp.status_code}\n{resp.reason}')
      raise(Exception(resp.text))
    return (json.loads(resp.text))['task_id']

The pipeline config is like this:

  def _pipeline_config(self, pipeline_field_map=None):
    if not pipeline_field_map:
      pipeline_field_map = {'chunk': 'chunk_vector'}
    config = {
      "description": "Pipeline for processing chunks",
      "processors" : [
        {
          "text_embedding": {
            "model_id": f'{self.model_id()}',
            "field_map": pipeline_field_map
          }
        }
      ]
    }
    logging.info(config)
    return config

  def _add_neural_pipeline(self, pipeline_name='', pipeline_field_map=None):
    if not pipeline_name:
      raise Exception('add_neural_pipeline: pipeline name must be specified')
    pipeline_config = self._pipeline_config(pipeline_field_map=pipeline_field_map)
    logging.info('Adding neural pipeline...')
    self.os_client.ingest.put_pipeline(pipeline_name, body=pipeline_config)

And the full kNN setup was this:

  def setup_for_kNN(self, index_name='', index_settings='', pipeline_name=None, pipeline_field_map=None,
                    model_name='', model_dimensions=0):
    logging.info(f'Setup for KNN; ml model: {self.ml_model}; ml model group: {self.ml_model_group}')
    self.index_name = index_name
    self.pipeline_name = pipeline_name
    self.os_client.cluster.put_settings(body={"persistent" : {"plugins.ml_commons.only_run_on_ml_node" : "true"}})
    self.os_client.cluster.put_settings(body={"persistent" : {"plugins.ml_commons.model_access_control_enabled" : "false"}})
    self.os_client.cluster.put_settings(body={"persistent" : {"plugins.ml_commons.allow_registering_model_via_url" : "true"}})
    self.ml_model_group = MLModelGroup(self.os_client, self.ml_commons_client, admin_user=self._admin_user,
                                       admin_password=self._admin_password)
    time.sleep(1)
    self.ml_model = MLModel(self.os_client, self.ml_commons_client, self.ml_model_group.model_group_id(),
                            model_name=model_name, model_dimensions=model_dimensions,
                            admin_user=self._admin_user, admin_password=self._admin_password)
    self.clean_create_index(index_name=index_name, settings=index_settings)
    self._add_neural_pipeline(pipeline_name=pipeline_name, pipeline_field_map=pipeline_field_map)

MLModel and MLModelGroup are classes that wrap the calls to OpenSearch

xinyual commented 2 months ago

And could you provide the model and version when there is an error? If it is huggingface/sentence-transformers/msmarco-distilbert-base-tas-b and version 1.0.1, it is because we don't have truncation. We fix that in version 1.0.2. I notice you set the version to 1.0.1, so please check if we use correct version, is there an error here?

Jon-AtAWS commented 2 months ago

Ah! That must be the error. What I send to the _ml API is

    data={
        "name": model_name,
        "version": "1.0.1",
        "model_format": "TORCH_SCRIPT",
        "model_group_id": self.model_group_id
    }

The versioning is confusing... can we support a "$latest" parameter or similar? That way, I can hard-wire a version for production, but can always test with the latest. It's a good thing most models have version 1.0.1... It never errored, so I never paid attention to the version parameter.

By the way, my above fix is not right, I needed to do this:

def create_max_500_character_string(s):
  ret = ""
  words = s.split(' ')
  for word in words:
    if len(ret) + len(word) > 500:
      break
    ret = f'{ret} {word}'
  return ret

Which is working. So, looks like the truncation has to be characters, not tokens.

opensearch-project / ml-commons

[BUG] Varying dimensions for distiluse-base-multilingual-cased-v1 #2302