triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
118 stars 28 forks source link

Unable to load numpy module in a DALI backend #223

Open mvpel opened 6 months ago

mvpel commented 6 months ago

I'm using a very close approximation of the https://docs.nvidia.com/deeplearning/dali/user-guide/docs/math.html example to try to scale 0-255 RGB image values to 0.0-1.0 floating point numbers, due to the way our inference models were trained.

I tried running it without an "import numpy as np" line at first, which threw a NameError, but when I added that line, I got a "no module named numpy" error as Triton was working to load the model:

I0124 23:43:09.142134 210089 dali_backend.cc:43] TRITONBACKEND_Initialize: dali
I0124 23:43:09.142195 210089 dali_backend.cc:50] Triton TRITONBACKEND API version: 1.10
I0124 23:43:09.142203 210089 dali_backend.cc:54] 'dali' TRITONBACKEND API version: 1.10
I0124 23:43:09.142209 210089 dali_backend.cc:71] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0124 23:43:09.142289 210089 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0124 23:43:09.142295 210089 dali_backend.cc:131] Repository location: /triton.repos.d/image_one255_494x648x3
I0124 23:43:09.142300 210089 dali_backend.cc:142] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/triton.repos.d/image_one255_494x648x3/1/dali.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
I0124 23:43:10.164297 210089 dali_backend.cc:170] TRITONBACKEND_ModelFinalize: delete model state
E0124 23:43:10.164338 210089 model_lifecycle.cc:596] failed to load 'image_one255_494x648x3' version 1: Unknown: DALI Backend error: Failed to load model file. The program looked in the following locations: /triton.repos.d/image_one255_494x648x3/1/dali.py, /triton.repos.d/image_one255_494x648x3/1/dali.py. Please make sure that the model exists in any of the locations and is properly serialized or can be properly serialized.

Here's my full pipeline in the dali.py:

import numpy as np
import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize
import nvidia.dali.types as types

@dali.plugin.triton.autoserialize
@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0, output_dtype=types.FLOAT, output_ndim=[3])
def one255_pipe():
    images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = dali.fn.decoders.image(images, device="cpu")
    images = images / types.Constant(np.float32([255.0, 255.0, 255.0]))
    return images

I'm testing this under Triton v22.08, due to some program software approval requirements here, using the NGC Triton container.

Thanks for any suggestions you can offer!

mvpel commented 6 months ago

It occurred to me to try importing sys to check sys.path, and I found:

-------> sys.path is:  ['', '/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python38.zip',
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8', 
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8/lib-dynload', 
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8/site-packages']`

The only references to numpy in these paths was a collection of .h files:

Apptainer> find . -name 'numpy*'
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/loader/numpy_loader.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/loader/numpy_loader_gpu.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/numpy_reader_gpu_op.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/numpy_reader_op.h
Apptainer>

I threw in a sys.path.append() to add the /usr/local Python installation's path to sys.path, and that seems to have enabled Numpy to load. It threw an error but it appears to have loaded succesfully:

I0125 15:23:32.870758 1421448 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0125 15:23:32.870836 1421448 dali_backend.cc:131] Repository location: /triton.repos.d/image_one255_494x648x3
I0125 15:23:32.870843 1421448 dali_backend.cc:142] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
I0125 15:23:34.544454 1421448 dali_model.h:175] DALI pipeline from file /triton.repos.d/image_one255_494x648x3/1/dali.py
loaded successfully.

Hopefully it will work as intended in spite of the error. Any idea what might be going on? The message is pretty ambiguous.

With respect to Numpy, did I miss a step somewhere? Maybe I need to add Numpy to the DALI backend virtualenv?

banasraf commented 6 months ago

Hey @mvpel The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.
mvpel commented 6 months ago

Hey @mvpel The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.

Nice, thanks! I'm puzzled that the example in the DALI math user guide didn't take that approach.