`model-navigator optimize` stops with an error

Nishikoh commented 1 year ago

Running model-navigator optimise my_model.nav following the instructions in quick_start.md will stop with an error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/cli/analyze.py", line 90, in analyze_cmd
    analyze_results = analyzer.run()
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/model_analyzer/analyzer.py", line 72, in run
    analyzer.run(mode=ModelAnalyzerMode.ANALYZE, verbose=self._verbose, quiet=quiet)
  File "/usr/local/lib/python3.8/dist-packages/model_navigator/model_analyzer/model_analyzer.py", line 73, in run
    raise ModelNavigatorException(
model_navigator.exceptions.ModelNavigatorException: Running model-analyzer with ['model-analyzer', '--quiet', 'analyze', '-f', '/scratch_space/navigator_workspace/analyzer/config-analyze.yaml'] failed with exit status 1 : None output : None

model-analyzer anlyze is an alias for model-analyzer profile, so the analysis_models in the generated navigator_workspace/analyzer/config-analyze.yaml I think the key is wrong. I think changing analysis_models to profile_models should work

environment

base image: nvcr.io/nvidia/tritonserver:23.01-py3
triton-model-analyzer 1.24.0
model-navigator 0.3.7

jkosek commented 1 year ago

@Nishikoh, please limit the Model Analyzer version in setup.cfg in cli section as follow:

triton-model-analyzer>=1.16.0,<1.22.0

and add in install_requires:

numpy>=1.22.2,<1.24.0

We are currently working on releasing new version that introduce API changes and also remove Triton Model Analyzer from dependencies.

Nishikoh commented 1 year ago

@jkosek The model-analyzer analyze command worked. Thank you! Then I got another error. Is there a solution?

error log

```py Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/model_navigator/utils/config.py", line 101, in from_dict return dict2dataclass(cls, data) File "/usr/local/lib/python3.8/dist-packages/model_navigator/utils/config.py", line 86, in dict2dataclass return dacite.from_dict(cls, data, config=dacite.Config(cast=[Enum, Path, tuple, np.dtype])) File "/usr/local/lib/python3.8/dist-packages/dacite/core.py", line 69, in from_dict raise WrongTypeError(field_path=field.name, field_type=field_type, value=value) dacite.exceptions.WrongTypeError: wrong value type for field "engine_count_per_device" - should be "typing.Dict[model_navigator.triton.config.DeviceKind, int]" instead of value "{'gpu': 5}" of type "dict" During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/bin/model-navigator", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_navigator/cli/main.py", line 53, in main cli(max_content_width=160) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/model_navigator/cli/optimize.py", line 307, in optimize_cmd create_helm_chart_result = ctx.forward( File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 781, in forward return __self.invoke(__cmd, *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/model_navigator/cli/helm_chart_create.py", line 127, in helm_chart_create_cmd instances_config = TritonModelInstancesConfig.from_dict(kwargs) File "/usr/local/lib/python3.8/dist-packages/model_navigator/utils/config.py", line 105, in from_dict raise ValueError(e) ValueError: wrong value type for field "engine_count_per_device" - should be "typing.Dict[model_navigator.triton.config.DeviceKind, int]" instead of value "{'gpu': 5}" of type "dict" ```

environment

```yaml cpu: logical_cores: 4 max_frequency: 0.0 min_frequency: 0.0 name: Intel(R) Xeon(R) CPU @ 2.20GHz physical_cores: 2 gpu: cuda_version: '12.0' driver_version: 510.47.03 memory: 15360 MiB name: Tesla T4 tdp: 70.00 W libraries: CUBLAS_VERSION: 12.0.2.224 CUDA_DRIVER_VERSION: 525.85.11 CUDA_VERSION: 12.0.1.010 CUDNN_VERSION: 8.7.0.84+cuda11.8 CUFFT_VERSION: 11.0.1.95 CURAND_VERSION: 10.3.1.124 CUSOLVER_VERSION: 11.4.3.1 CUSPARSE_VERSION: 12.0.1.140 DALI_BUILD: '6799315' DALI_VERSION: 1.21.0 NCCL_VERSION: 2.16.5 NPP_VERSION: 12.0.1.104 NSIGHT_COMPUTE_VERSION: 2022.4.1.6 NSIGHT_SYSTEMS_VERSION: 2022.5.1.93 NVIDIA_BUILD_ID: '52277748' NVIDIA_TRITON_SERVER_VERSION: '23.01' NVJPEG_VERSION: 12.0.1.102 OPENMPI_VERSION: 4.1.4 OPENUCX_VERSION: 1.14.0 TRITON_SERVER_VERSION: 2.30.0 TRTOSS_VERSION: '23.01' TRT_VERSION: 8.5.2.2+cuda11.8.0.065+520.61.05+cublas11.11.3.6 memory: 14.7G os: name: posix platform: Linux release: 4.19.0-23-cloud-amd64 python_packages: numpy: 1.23.5 onnx: 1.13.0 onnxruntime-gpu: 1.14.0 polygraphy: 0.44.2 tensorrt: 8.5.3.1 torch: 1.13.1 torch-tensorrt: 1.3.0 triton-model-analyzer: 1.21.0 tritonclient: 2.30.0 python_version: 3.8.10 ```

jkosek commented 1 year ago

@Nishikoh I suggest to switch to new flow introduced in 0.4.0 version:

Nishikoh commented 1 year ago

It worked well. Thank you!

triton-inference-server / model_navigator

`model-navigator optimize` stops with an error #18

environment