Open FernandoDorado opened 1 month ago
I don't really understand your question. Could you explain more? For "track all inputs and predictions", do you mean the input token/context and the generated token? If so, what do you mean the "track"? Currently, we have returned full inputs tokens and output tokens.
Hello @byshiue
I am asking if there is a way to store all the input data (for example, the payload sent to the model to generate a prediction) and the model response using the previous payload in order to analyse the model behaviour and track all the interactions.
This is an example of the requested funtionality but using another tool: https://docs.seldon.io/projects/seldon-core/en/latest/streaming/knative_eventing.html
Hello,
I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction the model handles, including input data and the corresponding predictions.
I have reviewed the documentation about Triton Server Trace, but it is unclear if this feature can track predictions as well. You can find the documentation here: Triton Server Trace Documentation.
Additionally, I am concerned about the impact of tracking on system latency. While I am aware that solutions for traditional ML platforms (such as Seldon-Core) often use technologies like KNative and Kafka to store tracking information, it is not clear how these approaches can be integrated with Triton without compromising performance.
I would appreciate recommendations on:
Thank you for your assistance.