Optimum neuron inference test results with NotImplementedError. #571

Open musunita opened 2 months ago

musunita commented 2 months ago

System Info

Hugging Face Neuron Deep Learning AMI (Ubuntu 22.04)[ami=ami-073e0687022c65b38 ]

Instance type: Inf2.48xlarge

pre-installed package:

aws_neuron_venv_pytorch) ubuntu@ip-172-31-9-88:~$ pip list | grep "neuron"
aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.971
neuronx-distributed           0.7.0
optimum-neuron                0.0.21
torch-xla                     1.13.1+torchneurone
Using 0.0.21 version of optimum-neuron
aws_neuron_venv_pytorch) ubuntu@ip-172-31-9-88:~/optimum-neuron$ git branch
* (HEAD detached at v0.0.21)

Who can help?

@dacorvo, @JingyaHuang,

Inference test details is enclosed in tests in optimum-neuron.pdf. Tests in Optimum Neuron (3) (4).pdf

While running inference tests on INf2.48xlarge, encounter following errors.

(aws_neuron_venv_pytorch) ubuntu@ip-172-31-9-88:~/optimum-neuron$ pytest -m is_inferentia_test tests
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.8.10, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/ubuntu/optimum-neuron
configfile: pyproject.toml
plugins: anyio-4.2.0
collected 1134 items / 1 error / 665 deselected / 469 selected                                                                                                                            

ERRORS ==========================================================================================
_________________________________________________________________________ ERROR collecting tests/ _________________________________________________________________________
tests/ in <module>
    class CausalLMExampleTester(ExampleTesterBase, metaclass=ExampleTestMeta, example_name="run_clm"):
tests/ in __new__
    pp_support = ParallelizersManager.parallelizer_for_model(model_type).supports_pipeline_parallelism()
optimum/neuron/distributed/ in parallelizer_for_model
    raise NotImplementedError(
E   NotImplementedError: gpt2 is not supported for parallelization, supported models: bert, roberta, gpt_neo, gpt_neox, llama, mistral, t5
==================================================================================== warnings summary =====================================================================================
  /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/optimum/commands/ DeprecationWarning: pkg_resources is deprecated as an API. See
    from pkg_resources import get_distribution

  /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/pkg_resources/ DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See

  /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/pkg_resources/ DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See

  /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/pkg_resources/ DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('zope')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See

================================================================================= short test summary info =================================================================================
ERROR tests/ - NotImplementedError: gpt2 is not supported for parallelization, supported models: bert, roberta, gpt_neo, gpt_neox, llama, mistral, t5
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
====================================================================== 665 deselected, 5 warnings, 1 error in 3.41s ======================================================================



Reproduction (minimal, reproducible, runnable)

steps: Run tests a. pytest -m is_inferentia_test tests

Expected behavior

Expect to see inference suite passing.

dacorvo commented 2 months ago

cc @michaelbenayoun

musunita commented 2 months ago

Thank for the fix. I tried the fix and I dont see above errors anymore which is great. But I encounter a new error now as below.

No input shapes provided, using default shapes, {'batch_size': 1, 'sequence_length': 128} ___ test_export_no_parameters[feature-extraction-hf-internal-testing/tiny-random-XLMModel] ___

self = <optimum.neuron.utils.hub_neuronx_cache.CompileCacheHfProxy object at 0x7f0d8009f850>, repo_id = 'optimum-internal-testing/optimum-neuron-cache-for-testing' default_cache = <libneuronxla.neuron_cc_cache.CompileCacheFs object at 0x7f0f393d7c70>, endpoint = None, token = None

def __init__(
    self, repo_id: str, default_cache: CompileCache, endpoint: Optional[str] = None, token: Optional[str] = None
    # Initialize the proxy cache as expected by the parent class
    self.cache_path = default_cache.cache_path
    # Initialize specific members
    self.default_cache = default_cache
    self.api = HfApi(endpoint=endpoint, token=token, library_name="optimum-neuron", library_version=__version__)
    # Check if the HF cache id is valid
        if not self.api.repo_exists(repo_id):
          raise ValueError(f"The {repo_id} repository does not exist or you don't have access to it.")

E ValueError: The optimum-internal-testing/optimum-neuron-cache-for-testing repository does not exist or you don't have access to it.

optimum/neuron/utils/ ValueError

michaelbenayoun commented 2 months ago

You need to set the CUSTOM_CACHE_REPO environment variable with a Hub repo to use as a cache repo and then set the HF_TOKEN environment variable with a token that has write access to the repo.

musunita commented 2 months ago

