huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
123 stars 147 forks source link

Beach search transformers test cases are failing with KeyError: 'limit_hpu_graphs' #445

Closed ankurneog closed 9 months ago

ankurneog commented 9 months ago

System Info

After recent integration of transformer test cases to optimum-habana ,it was observed that several beam search related testcases are failing with the following error, The test case below is for T5 but several other language modelling models such as GPT2, GPTJ, GPTNEOX etc. also invoke the same test case and hence they are failing as well.

Logs : 
 <pt> (conda_qnpu1) (anneog_transformers_tests_updates) anneog@anneog-vm-u20:t5 $ python -m pytest -vs test_modeling_t5.py::T5ModelTest::test_beam_search_generate
============================================================================================= test session starts =============================================================================================
platform linux -- Python 3.8.18, pytest-7.4.2, pluggy-1.3.0 -- /home/anneog/anaconda3/envs/conda_qnpu1/bin/python
cachedir: .pytest_cache
rootdir: /home/anneog/github/ankurneog/optimum-habana
configfile: setup.cfg
collecting ... [WARNING|utils.py:179] 2023-10-04 06:56:49,584 >> optimum-habana v1.8.0.dev0 has been validated for SynapseAI v1.11.0 but habana-frameworks v1.13.0.133 was found, this could lead to undefined behavior!
[WARNING|utils.py:196] 2023-10-04 06:56:49,606 >> Could not run `hl-smi`, please follow the installation guide: https://docs.habana.ai/en/latest/Installation_Guide/index.html.
collected 1 item                                                                                                                                                                                              

test_modeling_t5.py::T5ModelTest::test_beam_search_generate ============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 8
CPU RAM       : 40852220 KB
------------------------------------------------------------------------------
FAILED

================================================================================================== FAILURES ===================================================================================================
____________________________________________________________________________________ T5ModelTest.test_beam_search_generate ____________________________________________________________________________________

self = <tests.models.t5.test_modeling_t5.T5ModelTest testMethod=test_beam_search_generate>

    def test_beam_search_generate(self):
        for model_class in self.all_generative_model_classes:
            config, input_ids, attention_mask, max_length = self._get_input_ids_and_config()

            # It is important set set the eos_token_id to None to ensure that no sequences
            # shorter than `max_length` can be generated which could lead to flaky circle ci
            # failures if the top `num_return_sequences` beams are all shorter than the longest beam
            config.eos_token_id = None
            config.forced_eos_token_id = None

            model = model_class(config).to(torch_device).eval()
            if model.config.is_encoder_decoder:
                max_length = 4

            logits_process_kwargs, logits_processor = self._get_logits_processor_and_kwargs(
                input_ids.shape[-1],
                config.eos_token_id,
                config.forced_bos_token_id,
                config.forced_eos_token_id,
                max_length,
            )
            beam_kwargs, beam_scorer = self._get_beam_scorer_and_kwargs(input_ids.shape[0], max_length)

            # check `generate()` and `beam_search()` are equal
>           output_generate, output_beam_search = self._beam_search_generate(
                model=model,
                input_ids=input_ids,
                attention_mask=attention_mask,
                max_length=max_length,
                beam_scorer=beam_scorer,
                beam_kwargs=beam_kwargs,
                logits_process_kwargs=logits_process_kwargs,
                logits_processor=logits_processor,
            )

../../generation/test_utils.py:881: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../generation/test_utils.py:422: in _beam_search_generate
    output_beam_search = model.beam_search(
../../../../../optimum/habana/transformers/generation/utils.py:1995: in beam_search
    hpu_graphs_kwargs = self._get_hpu_graphs_kwargs(model_kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = T5ForConditionalGeneration(
  (shared): Embedding(99, 32)
  (encoder): T5Stack(
    (embed_tokens): Embedding(99, 32)
...m()
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (lm_head): Linear(in_features=32, out_features=99, bias=False)
)
model_kwargs = {'encoder_outputs': BaseModelOutputWithPastAndCrossAttentions(last_hidden_state=tensor([[[-4.7808e-04, -6.3646e-04, -2...    grad_fn=<IndexSelectBackward0>), past_key_values=None, hidden_states=None, attentions=None, cross_attentions=None)}

    def _get_hpu_graphs_kwargs(self, model_kwargs):
        hpu_graphs_kwargs = {}
>       if model_kwargs["limit_hpu_graphs"]:
E       KeyError: 'limit_hpu_graphs'

../../../../../optimum/habana/transformers/generation/utils.py:141: KeyError

@@p9olisettyvarma could you have a look. I think we should modify the code so that the key is not accessed if it is not filled in the dictionary for eg. a check for if key not in dict and the return hpu_graphs_kwargs with default values

Information

Tasks

Reproduction

  1. Clone optimum-habana
  2. pip install pytest
  3. cd optimum-habana/tests/transformers/tests/model/t5
  4. python -m pytest -vs test_modeling_t5.py::T5ModelTest::test_beam_search_generate

Expected behavior

The test should pass.

ankurneog commented 9 months ago

@p9olisettyvarma please have a look. fyi : @regisss

ankurneog commented 9 months ago

FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_output - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_beam_search_generate_dict_outputs_use_cache - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_constrained_beam_search_generate_dict_output - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_sample_generate - KeyError: 'limit_hpu_graphs' FAILED test_modeling_t5.py::T5ModelTest::test_sample_generate_dict_output - KeyError: 'limit_hpu_graphs'