Open yongbinfeng opened 1 year ago
Additional info: based on the outputs printed above (which show the same 3 sequences, just in different orders) and the diagram at https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models, we think the ensemble model is sometimes calling the second model before the first model, since they have no explicit dependence. For complete reproducibility, we need to be able to turn off such behavior.
In order to guarantee one model is called before the other - you would need to create a dependence between the models (the output of one model is fed into the other). Otherwise there is no way to guarantee the order of processing and the overall application shouldn't rely on the order. Another potential would be to create a BLS model that calls one model after the other.
Barring a dependence - I don't think there is anyway of guaranteeing the order in which the models process the inputs.
Right now, that seems to be true. It would be nice to add an ensemble model setting to support this case more clearly.
Right now, that seems to be true. It would be nice to add an ensemble model setting to support this case more clearly.
Can you explain more on the semantics for the setting / use case? Is it to guarantee that one completes before the other starts - think using dependence is the best way to support that.
If there is some unknown dependence on the models in terms of seeds / parameters - even executing them in the same order won't guarantee internal timing. Since the models are executing on their own independent threads - generally there is no synchronization between them except for as dictated by the steps.
Since the models are executing on their own independent threads
If this is correct, then it may be a particular bad interaction between this mode of running models and the PyTorch backend. In the example here, random calls in one model seem to affect random calls in the other model, which is not expected or desired behavior. The model source tries to set the random seed in as many ways as possible.
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
@krishung5 - thanks for following up - there are few opens we are discussing with @kpedro88 - I'll follow up and keep this open as a reminder to finalize any options
Description In our models we need to call
torch.randperm
function. When running inferences on separate models, the results are always consistent; but running with ensemble models, the results start to becom random. After some debugging of the inferences, we realized it was caused by thetorch.randperm
function being undeterministic.Triton Information We are testing with the pre-built image
nvcr.io/nvidia/tritonserver:23.08-py3
but as far as we can tell the issue persists in different versions.To Reproduce Steps to reproduce the behavior.
We have prepared a repository https://github.com/yongbinfeng/TorchDeterministic/tree/main/models with the models included. When running inferences separately on model1 or model2. The outputs are consistent. When running inferences on modelEnsemble, it starts to become random.
The script to make the model is also provided: https://github.com/yongbinfeng/TorchDeterministic/blob/main/MakeModel/test.py Essentially it is just the
torch.randperm
function.One toy client we used for testing can be found here: https://github.com/yongbinfeng/TorchDeterministic/blob/main/client/client.py
For example, when running with
model1
, we getconsistent outputs everytime.
But when running with
modelEnsemble
, we get:where the outputs start to jump around and undeterministic.
Expected behavior Expect the outputs to be the same and consistent with ensemble models.