triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
118 stars 28 forks source link

'NoneType' object has no attribute 'loader' when trying to load DALI model. #236

Closed Skier23 closed 3 months ago

Skier23 commented 3 months ago

I'm trying to follow along with the simple tutorials to get a dali preprocessing pipeline in place with triton but getting some errors when trying to do so:

Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 568, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/ops/__init__.py:425: DeprecationWarning: WARNING: `image_decoder` is now deprecated. Use `decoders.image` instead.
In DALI 1.0 all decoders were moved into a dedicated :mod:`~nvidia.dali.fn.decoders`
submodule and renamed to follow a common pattern. This is a placeholder operator with identical
functionality to allow for backward compatibility.
  op_instances.append(_OperatorInstance(input_set, self, **kwargs))
I0403 19:43:51.541214 1 dali_model.h:212] DALI pipeline from file /models/dali_preprocessing/1/dali.py loaded successfully.
I0403 19:43:51.577027 1 dali_backend.cc:164] TRITONBACKEND_ModelFinalize: delete model state
E0403 19:43:51.577063 1 model_lifecycle.cc:638] failed to load 'dali_preprocessing' version 1: Unknown: DALI Backend error: Critical error when building pipeline:
Error when constructing operator: ImageDecoder encountered:
Error in thread 0: nvml error (3): The nvml requested operation is not available on target device
Current pipeline object is no longer valid.

dali.py:

import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@autoserialize 
@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0)
def pipe():
    images = fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = fn.image_decoder(images, device="mixed", output_type=types.RGB)
    # Resize to 384x384 using bicubic interpolation
    images = fn.resize(images, resize_x=384, resize_y=384, interp_type=types.INTERP_CUBIC)
    # Normalize
    images = fn.crop_mirror_normalize(
        images,
        dtype=types.FLOAT16,
        mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
        std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
        output_layout="CHW")
    return images

I get the same/similar error when using a serialized dali mode

Skier23 commented 3 months ago

I get the same error if I use the exact code and config from here: https://developer.nvidia.com/blog/accelerating-inference-with-triton-inference-server-and-dali/

JanuszL commented 3 months ago

Hi @Skier23,

Can you tell us how you run the TRITON server? Which OS and GPU do you use? It seems that NVML, which is part of the CUDA driver cannot be accessed.

Skier23 commented 3 months ago

Sure, this is the docker command I use to launch the triton server. I have been using this command previously and running benchmarks on the Triton server model and its definitely using GPU/CUDA: docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models

GPU: 3090 OS: Launched the docker container from Windows but that shouldnt matter since its a linux docker container.

JanuszL commented 3 months ago

Hi @Skier23,

In this case, this is the problem with NVML support on the Windows platform. To run DALI inside WSL please also pass DALI_DISABLE_NVML=1 to the docker env to make sure DALI doesn't try to use it (-e DALI_DISABLE_NVML=1 added to docker invocation cmd).

Skier23 commented 3 months ago

Why would it be related to windows when its running a Linux docker container? The host platform shouldnt matter since thats the whole point of a docker container right?

JanuszL commented 3 months ago

Why would it be related to windows when its running a Linux docker container? The host platform shouldnt matter since thats the whole point of a docker container right?

The nvml is part of the driver which is exposed from the host by the docker runtime. nvml is just not fully supported inside WSL.

Skier23 commented 3 months ago

Its not supported on windows either then? It looks like this documentation suggests it can be installed on windows at least and is used for nvidia-smi: https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html

It does look like using that environment variable to disable it allows it to load the model on windows though.

JanuszL commented 3 months ago

The wsl is a special case, please check this part of the release note.

Skier23 commented 3 months ago

Right, but I was just trying to run it natively on windows. (via the linux docker container) Is that case also not supported?

JanuszL commented 3 months ago

Right, but I was just trying to run it natively on windows. (via the linux docker container) Is that case also not supported?

To my best understanding docker on windows uses WSL under the hood - https://docs.docker.com/desktop/install/windows-install/ - Turn on the WSL 2 feature on Windows..

Skier23 commented 3 months ago

Gotcha, that makes more sense then. Thanks!

jet082 commented 1 month ago

Can confirm the same problem on my own system using Docker. Unfortunate.

szalpal commented 1 month ago

@jet082 ,

Does any of the above suggestions by @JanuszL help in your case? If not, please provide some details about your system configuration, so we'd be able to help you.

jet082 commented 1 month ago

@jet082 ,

Does any of the above suggestions by @JanuszL help in your case? If not, please provide some details about your system configuration, so we'd be able to help you.

I added DALI_DISABLE_NVML=1 and it works, but I think that means that this step isn't using the GPU? Am I incorrect about that? Still learning this stuff.

JanuszL commented 1 month ago

Hi @jet082,

but I think that means that this step isn't using the GPU?

This extra variable disables a certain functionality which may lead to a small drop in performance. In most cases, it should be negligible, but still, the GPU is available and should be used.