Closed chauhang closed 1 year ago
Hmm this is a tougher one we need to flesh out, the same infra will help us for time series predictions.
In general we do support inferencing on several frames in the same batch but don't have a good way of saving inference results across batches.
We've previously chatted about caching results with something like Redis or a remote object store from the handler which shouldn't be too hard to support or adding native support for this.
@chauhang large video inference blog is an example on SM. After a long investigation with our SA, finally they are convinced that it is not a good solution for production to send large video directly to model server b/c high I/O causes high latency. For production, the video should be segmented into frames b4 sending to model server. Now many customers build pipeline on EKS to process large video.
Add an example to demonstrate video inferencing where a video payload is sent and predictions are done for each frame. Eg for activity recognition across a video stream