triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
118 stars 28 forks source link

Fix autoconfiguation device selection #229

Closed banasraf closed 4 months ago

banasraf commented 4 months ago

This PR fixes a bug that caused model autoconfiguration to fail.

When doing autoconfiguration, we need to instantiate the pipeline that is associated with that model. We usually have to do that using a gpu. Model itself is not bound to any particular device, so we do our best effort when choosing one to instantiate this temporary pipeline.

When there is no information on the device to use in the config file, we choose any available gpu (0). If there's a cpu instance group mentioned in the config file, we can use the CPU_ONLY_DEVICE, because the pipeline should work on cpu only. Otherwise, we choose any gpu mentioned in the config file.

This logic was broken and the CPU_ONLY_DEVICE was chosen even when it shouldn't be (e.g. there was some incomplete specification of instance groups in the config). This caused the temporary pipeline build failure.

dali-automaton commented 4 months ago

CI MESSAGE: [13282018]: BUILD STARTED

dali-automaton commented 4 months ago

CI MESSAGE: [13282018]: BUILD FAILED

dali-automaton commented 4 months ago

CI MESSAGE: [13296515]: BUILD STARTED

dali-automaton commented 4 months ago

CI MESSAGE: [13296515]: BUILD FAILED

dali-automaton commented 4 months ago

CI MESSAGE: [13311347]: BUILD STARTED

dali-automaton commented 4 months ago

CI MESSAGE: [13296515]: BUILD PASSED

dali-automaton commented 4 months ago

CI MESSAGE: [13311347]: BUILD PASSED

dali-automaton commented 4 months ago

CI MESSAGE: [13311347]: BUILD FAILED