triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

PyTorch torch.randperm function undeterministic when running with ensemble models #6371

Open yongbinfeng opened 1 year ago

yongbinfeng commented 1 year ago

Description In our models we need to call torch.randperm function. When running inferences on separate models, the results are always consistent; but running with ensemble models, the results start to becom random. After some debugging of the inferences, we realized it was caused by the torch.randperm function being undeterministic.

Triton Information We are testing with the pre-built image nvcr.io/nvidia/tritonserver:23.08-py3 but as far as we can tell the issue persists in different versions.

To Reproduce Steps to reproduce the behavior.

We have prepared a repository https://github.com/yongbinfeng/TorchDeterministic/tree/main/models with the models included. When running inferences separately on model1 or model2. The outputs are consistent. When running inferences on modelEnsemble, it starts to become random.

The script to make the model is also provided: https://github.com/yongbinfeng/TorchDeterministic/blob/main/MakeModel/test.py Essentially it is just the torch.randperm function.

One toy client we used for testing can be found here: https://github.com/yongbinfeng/TorchDeterministic/blob/main/client/client.py

For example, when running with model1, we get

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]

consistent outputs everytime.

But when running with modelEnsemble, we get:

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[8 5 3 6 4 1 0 9 2 7]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[8 5 3 6 4 1 0 9 2 7]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[8 5 3 6 4 1 0 9 2 7]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[8 5 3 6 4 1 0 9 2 7]

OUTPUT:
[5 9 8 2 3 0 4 6 7 1]
[9 3 1 0 4 5 2 8 7 6]

OUTPUT:
[8 5 3 6 4 1 0 9 2 7]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[9 3 1 0 4 5 2 8 7 6]
[5 9 8 2 3 0 4 6 7 1]

OUTPUT:
[8 5 3 6 4 1 0 9 2 7]
[5 9 8 2 3 0 4 6 7 1]

where the outputs start to jump around and undeterministic.

Expected behavior Expect the outputs to be the same and consistent with ensemble models.

kpedro88 commented 1 year ago

Additional info: based on the outputs printed above (which show the same 3 sequences, just in different orders) and the diagram at https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models, we think the ensemble model is sometimes calling the second model before the first model, since they have no explicit dependence. For complete reproducibility, we need to be able to turn off such behavior.

nnshah1 commented 1 year ago

In order to guarantee one model is called before the other - you would need to create a dependence between the models (the output of one model is fed into the other). Otherwise there is no way to guarantee the order of processing and the overall application shouldn't rely on the order. Another potential would be to create a BLS model that calls one model after the other.

Barring a dependence - I don't think there is anyway of guaranteeing the order in which the models process the inputs.

kpedro88 commented 1 year ago

Right now, that seems to be true. It would be nice to add an ensemble model setting to support this case more clearly.

nnshah1 commented 1 year ago

Right now, that seems to be true. It would be nice to add an ensemble model setting to support this case more clearly.

Can you explain more on the semantics for the setting / use case? Is it to guarantee that one completes before the other starts - think using dependence is the best way to support that.

If there is some unknown dependence on the models in terms of seeds / parameters - even executing them in the same order won't guarantee internal timing. Since the models are executing on their own independent threads - generally there is no synchronization between them except for as dictated by the steps.

kpedro88 commented 1 year ago

Since the models are executing on their own independent threads

If this is correct, then it may be a particular bad interaction between this mode of running models and the PyTorch backend. In the example here, random calls in one model seem to affect random calls in the other model, which is not expected or desired behavior. The model source tries to set the random seed in as many ways as possible.

krishung5 commented 10 months ago

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

nnshah1 commented 10 months ago

@krishung5 - thanks for following up - there are few opens we are discussing with @kpedro88 - I'll follow up and keep this open as a reminder to finalize any options