No support for optimum-habana pipeline() causes error during inference for PyTorch BERT finetuned model using dtype bf16

hchauhan123 commented 1 year ago

System Info

optimum-habana 1.5.0
docker version 1.9.0
pytorch version 1.13.1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

During the inference of bert (bert-large-uncased), finetuned on Financial PhraseBank dataset with bf16 data type, it results in an error.

The finetuning on Gaudi (HPU) is done with the help of optimum-habana library.

The transformers (4.28.1) and supporting libraries are installed as part of optimum-habana installation.

The finetuning works well for both data type (bf16 and fp32). The inference works well on fp32 data type. But when inference is done on bf16, it results in error.

The finetuning code is present here in finbert.py file.

import sys
import subprocess

subprocess.check_call([sys.executable, '-m', 'pip', 'install', 
'numpy', ' pandas', ' scikit-learn', 'datasets', 'optimum.habana', '--user'])

import pandas as pd
import numpy as np
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification
from optimum.habana import GaudiConfig, GaudiTrainer, GaudiTrainingArguments
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from datasets import Dataset

def load_data():
    df = pd.read_csv(
        'FinancialPhraseBank-v1.0/Sentences_50Agree.txt',
        sep='@',
        names=['sentence', 'label'],
        encoding = "ISO-8859-1")
    df = df.dropna()
    df['label'] = df['label'].map({"neutral": 0, "positive": 1, "negative": 2})
    df.head()

    df_train, df_test, = train_test_split(df, stratify=df['label'], test_size=0.1, random_state=42)
    df_train, df_val = train_test_split(df_train, stratify=df_train['label'],test_size=0.1, random_state=42)

    dataset_train = Dataset.from_pandas(df_train, preserve_index=False)
    dataset_val = Dataset.from_pandas(df_val, preserve_index=False)
    dataset_test = Dataset.from_pandas(df_test, preserve_index=False)

    return dataset_train, dataset_val, dataset_test

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {'accuracy': accuracy_score(predictions, labels)}

def main():
    dataset_train, dataset_val, dataset_test = load_data()

    bert_model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', num_labels=3)
    bert_tokenizer = AutoTokenizer.from_pretrained('bert-large-uncased')

    dataset_train = dataset_train.map(lambda e: bert_tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)
    dataset_val = dataset_val.map(lambda e: bert_tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)
    dataset_test = dataset_test.map(lambda e: bert_tokenizer(e['sentence'], truncation=True, padding='max_length' , max_length=128), batched=True)

    dataset_train.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'])
    dataset_val.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'])
    dataset_test.set_format(type='torch', columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'])

    args = GaudiTrainingArguments(
        output_dir='temp/',
        overwrite_output_dir=True,
        evaluation_strategy='epoch',
        save_strategy='no',
        logging_strategy='epoch',
        logging_dir='logs/',
        report_to='tensorboard',

        learning_rate=2e-5,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=4,
        num_train_epochs=5,
        weight_decay=0.01,
        metric_for_best_model='accuracy',

        use_habana=True,                        # use Habana device
        use_lazy_mode=True,                     # use Gaudi lazy mode
        use_hpu_graphs=True,                    # set value for hpu_graphs
        gaudi_config_name='gaudi_config.json',  # load config file
    )

    trainer = GaudiTrainer(
        model=bert_model,                   # the instantiated 🤗 Transformers model to be trained
        args=args,                          # training arguments, defined above
        train_dataset=dataset_train,        # training dataset
        eval_dataset=dataset_val,           # evaluation dataset
        compute_metrics=compute_metrics
    )

    trainer.train()   

if __name__ == '__main__':
    main()

It also needs a gaudi_config.json file which has details for bf16 dtype training. The gaudi_config.json file is:

{
  "use_habana_mixed_precision": true,
  "hmp_is_verbose": false,
  "use_fused_adam": true,
  "use_fused_clip_norm": true,
  "hmp_bf16_ops": [
    "add",
    "addmm",
    "bmm",
    "div",
    "dropout",
    "gelu",
    "iadd",
    "linear",
    "layer_norm",
    "matmul",
    "mm",
    "rsub",
    "softmax",
    "truediv"
  ],
  "hmp_fp32_ops": [
    "embedding",
    "nll_loss",
    "log_softmax",
    "cross_entropy"
  ]
}

Note: Keep both finbert.py and gaudi_config.json files in same folder.

Run it with command: export MASTER_ADDR="localhost" export MASTER_PORT="12345" mpirun -n 8 --bind-to core --map-by socket:PE=4 --rank-by core --report-bindings --allow-run-as-root python finbert.py

Note: It can also be finetuned on 1 card for debugging purpose.

After completing the finetuning on bf16 dtype, next while running the inference code either code-1 or code-2, it results in error.

Inference code-1:

from transformers import pipeline 
device=torch.device('hpu') 
pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device) 
print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
print(pipe("Economists are predicting the highest rate of employment in 15 years"))

Inference code-2:

from transformers import TextClassificationPipeline
pipe = TextClassificationPipeline(model=bert_model, tokenizer=bert_tokenizer)
pipe.device=torch.device('hpu')
print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
print(pipe("Economists are predicting the highest rate of employment in 15 years"))

Error seen after inference:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1146, in _LazyModule._get_module(self, module_name)
   1145 try:
-> 1146     return importlib.import_module("." + module_name, self.__name__)
   1147 except Exception as e:File /usr/lib/python3.8/importlib/__init__.py:127, in import_module(name, package)
    126         level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)File <frozen importlib._bootstrap>:991, in _find_and_load(name, import_)File <frozen importlib._bootstrap>:975, in _find_and_load_unlocked(name, import_)File <frozen importlib._bootstrap>:671, in _load_unlocked(spec)File <frozen importlib._bootstrap_external>:848, in exec_module(self, module)File <frozen importlib._bootstrap>:219, in _call_with_frames_removed(f, *args, **kwds)File /usr/local/lib/python3.8/dist-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:56
     51 # Fused kernels
     52 # Use separate functions for each case because conditionals prevent kernel fusion.
     53 # TODO: Could have better fused kernels depending on scaling, dropout and head mask.
     54 #  Is it doable without writing 32 functions?
     55 @torch.jit.script
---> 56 def upcast_masked_softmax(
     57     x: torch.Tensor, mask: torch.Tensor, mask_value: torch.Tensor, scale: float, softmax_dtype: torch.dtype
     58 ):
     59     input_dtype = x.dtypeFile /usr/local/lib/python3.8/dist-packages/torch/jit/_script.py:1343, in script(obj, optimize, _frames_up, _rcb, example_inputs)
   1342     _rcb = _jit_internal.createResolutionCallbackFromClosure(obj)
-> 1343 fn = torch._C._jit_script_compile(
   1344     qualified_name, ast, _rcb, get_default_args(obj)
   1345 )
   1346 # Forward docstringsFile /usr/local/lib/python3.8/dist-packages/torch/jit/_recursive.py:863, in try_compile_fn(fn, loc)
    862 rcb = _jit_internal.createResolutionCallbackFromClosure(fn)
--> 863 return torch.jit.script(fn, _rcb=rcb)File /usr/local/lib/python3.8/dist-packages/torch/jit/_script.py:1343, in script(obj, optimize, _frames_up, _rcb, example_inputs)
   1342     _rcb = _jit_internal.createResolutionCallbackFromClosure(obj)
-> 1343 fn = torch._C._jit_script_compile(
   1344     qualified_name, ast, _rcb, get_default_args(obj)
   1345 )
   1346 # Forward docstringsRuntimeError: 
Unknown type name 'DType':
  File "/usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/hpex/hmp/utils.py", line 1811
def softmax(input: Tensor, dim: Optional[int] = None, _stacklevel: int = 3, dtype: Optional[DType] = None) -> Tensor:
                                                                                            ~~~~~ <--- HERE
    r"""Applies a softmax function.
'softmax' is being compiled since it was called from 'upcast_masked_softmax'
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py", line 62
    x = x.to(softmax_dtype) * scale
    x = torch.where(mask, x, mask_value)
    x = torch.nn.functional.softmax(x, dim=-1).to(input_dtype)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return x
The above exception was the direct cause of the following exception:RuntimeError                              Traceback (most recent call last)
Cell In[26], line 3
      1 from transformers import pipeline
      2 device=torch.device('hpu')
----> 3 pipe = pipeline("text-classification", model=trainer.model, tokenizer=bert_tokenizer, device=device)
      4 #pipe = TextClassificationPipeline(model=bert_model, tokenizer=bert_tokenizer)
      5 #pipe = TextClassificationPipeline(model=bert_model, tokenizer=bert_tokenizer)
      6 #pipe.device=torch.device('hpu')
      8 print(pipe("Alabama Takes From the Poor and Gives to the Rich"))File /usr/local/lib/python3.8/dist-packages/transformers/pipelines/__init__.py:979, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
    976 if device is not None:
    977     kwargs["device"] = device
--> 979 return pipeline_class(model=model, framework=framework, task=task, **kwargs)File /usr/local/lib/python3.8/dist-packages/transformers/pipelines/text_classification.py:85, in TextClassificationPipeline.__init__(self, **kwargs)
     82 def __init__(self, **kwargs):
     83     super().__init__(**kwargs)
---> 85     self.check_model_type(
     86         TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING
     87         if self.framework == "tf"
     88         else MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING
     89     )File /usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py:942, in Pipeline.check_model_type(self, supported_models)
    940 if not isinstance(supported_models, list):  # Create from a model mapping
    941     supported_models_names = []
--> 942     for config, model in supported_models.items():
    943         # Mapping can now contain tuples of models for the same configuration.
    944         if isinstance(model, tuple):
    945             supported_models_names.extend([_model.__name__ for _model in model])File /usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py:644, in _LazyAutoMapping.items(self)
    643 def items(self):
--> 644     mapping_items = [
    645         (
    646             self._load_attr_from_module(key, self._config_mapping[key]),
    647             self._load_attr_from_module(key, self._model_mapping[key]),
    648         )
    649         for key in self._model_mapping.keys()
    650         if key in self._config_mapping.keys()
    651     ]
    652     return mapping_items + list(self._extra_content.items())File /usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py:647, in <listcomp>(.0)
    643 def items(self):
    644     mapping_items = [
    645         (
    646             self._load_attr_from_module(key, self._config_mapping[key]),
--> 647             self._load_attr_from_module(key, self._model_mapping[key]),
    648         )
    649         for key in self._model_mapping.keys()
    650         if key in self._config_mapping.keys()
    651     ]
    652     return mapping_items + list(self._extra_content.items())File /usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py:616, in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    614 if module_name not in self._modules:
    615     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 616 return getattribute_from_module(self._modules[module_name], attr)File /usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py:561, in getattribute_from_module(module, attr)
    559 if isinstance(attr, tuple):
    560     return tuple(getattribute_from_module(module, a) for a in attr)
--> 561 if hasattr(module, attr):
    562     return getattr(module, attr)
    563 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    564 # object at the top level.File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1136, in _LazyModule.__getattr__(self, name)
   1134     value = self._get_module(name)
   1135 elif name in self._class_to_module.keys():
-> 1136     module = self._get_module(self._class_to_module[name])
   1137     value = getattr(module, name)
   1138 else:File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1148, in _LazyModule._get_module(self, module_name)
   1146     return importlib.import_module("." + module_name, self.__name__)
   1147 except Exception as e:
-> 1148     raise RuntimeError(
   1149         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1150         f" traceback):\n{e}"
   1151     ) from eRuntimeError: Failed to import transformers.models.gpt_bigcode.modeling_gpt_bigcode because of the following error (look up to see its traceback):Unknown type name 'DType':
  File "/usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/hpex/hmp/utils.py", line 1811
def softmax(input: Tensor, dim: Optional[int] = None, _stacklevel: int = 3, dtype: Optional[DType] = None) -> Tensor:
                                                                                            ~~~~~ <--- HERE
    r"""Applies a softmax function.
'softmax' is being compiled since it was called from 'upcast_masked_softmax'
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py", line 62
    x = x.to(softmax_dtype) * scale
    x = torch.where(mask, x, mask_value)
    x = torch.nn.functional.softmax(x, dim=-1).to(input_dtype)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return x

Expected behavior

Inference is expected to work well in bf16 just like fp32 dtype.

Below shown output is expected [{'label': 'neutral', 'score': 0.9094224572181702}] [{'label': 'positive', 'score': 0.9752092957496643}]

regisss commented 1 year ago

Hi @hchauhan123! This is actually not a bug as Optimum Habana does not support Transformers' pipelines at the moment.

If you want to run inference on your test set, I recommend that you add trainer.evaluate() right after trainer.train() and this should work. Here is more information regarding running inference on Gaudi using the library.

hchauhan123 commented 1 year ago

Hi @regisss , So, I tried 2 methods just after the trainer.train() for inference. (1) trainer.evaluate(eval_dataset=dataset_test) which gave me below overall summary

{'eval_loss': 0.90251624584198,
 'eval_accuracy': 0.8577319587628865,
 'eval_runtime': 5.853,
 'eval_samples_per_second': 82.864,
 'eval_steps_per_second': 20.844,
 'epoch': 5.0,
 'memory_allocated (GB)': 7.49,
 'max_memory_allocated (GB)': 9.99,
 'total_memory_available (GB)': 30.24}

(2) trainer.predict(dataset_test).metrics Output summary:

{'test_loss': 0.90251624584198,
 'test_accuracy': 0.8536082474226804,
 'test_runtime': 1.2496,
 'test_samples_per_second': 388.11,
 'test_steps_per_second': 97.628}

This means at least here the model is able to do inference when finetuned above with bf16 dtype. But I am unable to do what pipleine() helped me achieve. I am not sure how to provide a single sentence or input/ test data and get its visualization with label and score.

I understand here with trainer.evaluate() and trainer.predict() the summary is for whole test dataset. dataset_test[1] does not help either to select only one test data entry.

Also, can you help me understand the original query why would the above pipeline() work for fp32 dtype and not for fp16?

regisss commented 1 year ago

@hchauhan123 You're using a BERT model but the error refers to a GPTBigCode architecture, that's weird. I can tell you that I've observed the same error when using mixed-precision with GPTBigCode. This is something we are aware of and that we are going to investigate :slightly_smiling_face: With BERT it should work though...

Could you try to add torch_dtype=torch.bfloat16 to your pipeline such as:

import torch
from transformers import pipeline 
device=torch.device('hpu') 
pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device, torch_dtype=torch.bfloat16) 
print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
print(pipe("Economists are predicting the highest rate of employment in 15 years"))

please?

hchauhan123 commented 1 year ago

@regisss I re-ran by adding torch_dtype=torch.bfloat16 and it still gives the same error as we saw earlier duing bf16 dtype.

RuntimeError: Failed to import transformers.models.gpt_bigcode.modeling_gpt_bigcode because of the following error 
(look up to see its traceback):

Unknown type name 'DType':
  File "/usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/hpex/hmp/utils.py", line 1813
def softmax(input: Tensor, dim: Optional[int] = None, _stacklevel: int = 3, dtype: Optional[DType] = None) -> 
Tensor:
                                                                                            ~~~~~ <--- HERE
    r"""Applies a softmax function.
'softmax' is being compiled since it was called from 'upcast_masked_softmax'
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py", line 62
    x = x.to(softmax_dtype) * scale
    x = torch.where(mask, x, mask_value)
    x = torch.nn.functional.softmax(x, dim=-1).to(input_dtype)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return x

hchauhan123 commented 1 year ago

Also, I feel the error in model is dependent on version of transformer installed. Just before doing the inference, I tried to install a different version of transformer (4.20.1) to override the transformer (4.28.1) which comes when installing optimum-habana, and the error was in different module. But again it is weird that why would it fail in some different model when I am using BERT. Again happens only during bf16.

Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py", line 1146, in _get_module
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers.models.ernie.modeling_ernie'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_79/1761011136.py", line 5, in <module>
    pipe = TextClassificationPipeline(model=bert_model, tokenizer=bert_tokenizer)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/text_classification.py", line 85, in __init__
    if isinstance(top_k, int) or top_k is None:
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 942, in check_model_type
    raise NotImplementedError("postprocess not implemented")
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 644, in items
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 647, in <listcomp>
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 616, in _load_attr_from_module
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 561, in getattribute_from_module
    return self._extra_content[key]
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py", line 1136, in __getattr__
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py", line 1148, in _get_module
RuntimeError: Failed to import transformers.models.ernie.modeling_ernie because of the following error (look up to see its traceback):
No module named 'transformers.models.ernie.modeling_ernie'

During handling of the above exception, another exception occurred:

regisss commented 1 year ago

@regisss I re-ran by adding torch_dtype=torch.bfloat16 and it still gives the same error as we saw earlier duing bf16 dtype.

That's strange. Have you run this code snippet in a script where a GaudiTrainer object was already instantiated? If yes, could you try after commenting/removing the trainer instantiation?

I'm going to try to reproduce it on my side.

hchauhan123 commented 1 year ago

I am basically running this in a jupyter notebook. So yes, in my notebook, in a cell the GaudiTrainer object was already instantiated and then the finetuning was done. The inference code is just after that in another cell.

regisss commented 1 year ago

I see. Can you try running the following code snippet please?

import torch
from transformers import pipeline 
from habana_frameworks.torch.hpex import hmp

device=torch.device('hpu') 
pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device, torch_dtype=torch.bfloat16) 

with hmp.disable_casts():
    print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
    print(pipe("Economists are predicting the highest rate of employment in 15 years"))

hchauhan123 commented 1 year ago

It gives the same error as posted above. No change.

regisss commented 1 year ago

Hmm. And this?

import torch
from habana_frameworks.torch.hpex import hmp

device=torch.device('hpu') 

with hmp.disable_casts():
    from transformers import pipeline 
    pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device, torch_dtype=torch.bfloat16) 
    print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
    print(pipe("Economists are predicting the highest rate of employment in 15 years"))

I have the feeling that the pipeline instantiation is the culprit here. It imports several architectures, which would explain why you get errors which are not related to your model.

hchauhan123 commented 1 year ago

Again, the same error. Yes, that could be the case. And what about if I use TextClassificationPipeLine (code-2 for inference). I believe since it is still sitting above pipleline() hence same issue occurs even there, right?

regisss commented 1 year ago

You can try, but I think a similar error will be raised, although with another imported architecture probably. I'm going to see if I can reproduce it.

hchauhan123 commented 1 year ago

I meant, I have tried that too earlier and saw the same error even with TextClassificationPipeLine.

regisss commented 1 year ago

I meant, I have tried that too earlier and saw the same error even with TextClassificationPipeLine.

Yes I'm not surprised by this.

I managed to reproduce this error, I'm going to investigate it and will let you know when I find something.

regisss commented 1 year ago

Okay, so I managed to make it work with:

import torch
torch.jit._state.disable()

device=torch.device('hpu') 

from transformers import pipeline 
pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device) 
print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
print(pipe("Economists are predicting the highest rate of employment in 15 years"))

Could you try it?

hchauhan123 commented 1 year ago

Awesome. That works. Yes, the torch.jit seems not supported. So disabling that works.

regisss commented 1 year ago

Great :tada:

This solution is a bit hacky, but hopefully this should not be needed soon as we are going to use native PyTorch Autocast for managing mixed precision (the PR is open here: https://github.com/huggingface/optimum-habana/pull/226). So that should not interfere with pipelines.

Another remark if you would like to improve inference speed. This code snippet

import torch
torch.jit._state.disable()

from transformers import pipeline
from habana_frameworks.torch.hpex import hmp

device=torch.device('hpu') 

with hmp.disable_casts(): 
    pipe = pipeline("text-classification", model=bert_model, tokenizer=bert_tokenizer, device=device, torch_dtype=torch.bfloat16) 
    print(pipe("Alabama Takes From the Poor and Gives to the Rich")) 
    print(pipe("Economists are predicting the highest rate of employment in 15 years"))

will probably give better latency and throughput as the model is fully casted to bf16. Whereas the current way uses mixed precision (i.e. a mix of bf16 and fp32).

Let me know if we can close this issue!

hchauhan123 commented 1 year ago

Yes, please close the issue.

huggingface / optimum-habana