triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
MIT License
118 stars 28 forks source link

Unable to load numpy module in a DALI backend #223

Open mvpel opened 6 months ago

mvpel commented 6 months ago

I'm using a very close approximation of the example to try to scale 0-255 RGB image values to 0.0-1.0 floating point numbers, due to the way our inference models were trained.

I tried running it without an "import numpy as np" line at first, which threw a NameError, but when I added that line, I got a "no module named numpy" error as Triton was working to load the model:

I0124 23:43:09.142134 210089] TRITONBACKEND_Initialize: dali
I0124 23:43:09.142195 210089] Triton TRITONBACKEND API version: 1.10
I0124 23:43:09.142203 210089] 'dali' TRITONBACKEND API version: 1.10
I0124 23:43:09.142209 210089] backend configuration:
I0124 23:43:09.142289 210089] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0124 23:43:09.142295 210089] Repository location: /triton.repos.d/image_one255_494x648x3
I0124 23:43:09.142300 210089] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/triton.repos.d/image_one255_494x648x3/1/", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
I0124 23:43:10.164297 210089] TRITONBACKEND_ModelFinalize: delete model state
E0124 23:43:10.164338 210089] failed to load 'image_one255_494x648x3' version 1: Unknown: DALI Backend error: Failed to load model file. The program looked in the following locations: /triton.repos.d/image_one255_494x648x3/1/, /triton.repos.d/image_one255_494x648x3/1/ Please make sure that the model exists in any of the locations and is properly serialized or can be properly serialized.

Here's my full pipeline in the

import numpy as np
import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize
import nvidia.dali.types as types

@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0, output_dtype=types.FLOAT, output_ndim=[3])
def one255_pipe():
    images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = dali.fn.decoders.image(images, device="cpu")
    images = images / types.Constant(np.float32([255.0, 255.0, 255.0]))
    return images

I'm testing this under Triton v22.08, due to some program software approval requirements here, using the NGC Triton container.

Thanks for any suggestions you can offer!

mvpel commented 6 months ago

It occurred to me to try importing sys to check sys.path, and I found:

-------> sys.path is:  ['', '/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/',

The only references to numpy in these paths was a collection of .h files:

Apptainer> find . -name 'numpy*'

I threw in a sys.path.append() to add the /usr/local Python installation's path to sys.path, and that seems to have enabled Numpy to load. It threw an error but it appears to have loaded succesfully:

I0125 15:23:32.870758 1421448] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0125 15:23:32.870836 1421448] Repository location: /triton.repos.d/image_one255_494x648x3
I0125 15:23:32.870843 1421448] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
I0125 15:23:34.544454 1421448 dali_model.h:175] DALI pipeline from file /triton.repos.d/image_one255_494x648x3/1/
loaded successfully.

Hopefully it will work as intended in spite of the error. Any idea what might be going on? The message is pretty ambiguous.

With respect to Numpy, did I miss a step somewhere? Maybe I need to add Numpy to the DALI backend virtualenv?

banasraf commented 6 months ago

Hey @mvpel The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.
mvpel commented 6 months ago

Hey @mvpel The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.

Nice, thanks! I'm puzzled that the example in the DALI math user guide didn't take that approach.