Open edmuthiah opened 3 years ago
Hi @ed-muthiah do you feel like your use case would be covered by increasing the batch size of your model and taking care of using multiple outputs in the inference function of a handler? https://github.com/aws-samples/amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve/blob/main/deployment/handler.py#L85
If that doesn't work you can also write your inferences in a file in the inference function and load them in subsequent calls to the handler
Let me know if you'd like more detail since I think it'd be cool to have an example like this in the repo
@msaroufim Hey Mark, I think the second method of writing every inference to file and reloading it back might significantly impact inference times compared to having the previous state available in memory.
With regards to batch processing could you please provide a bit more detail as to how this could accomodate sequential information? If set the batch size as 2, to use to consecutive frames in a video, but there are 4 workers, how would we ensure that the correct worker receives the correct sequential frame?
I had a look through the handler you linked, seems to me like a standard handler without batch processing or any write/loading to file (apart from the .pth).
I'd be happy to contribute to this as it's probably going to be a common request as people try out any model that uses series data.
@ed-muthiah this is an interesting use-case, Torchserve currently does not support linking data to a specific worker, its in our roadmap to support finer control for users in future versions, however the plan is so far about finer control on device level (selecting the gpu:id).
I wonder if for your current use-case as a workaround, there is a way to sign your input (similar to this sample) and return the respected response (similar to here), then post processing/aggregating inference can be done on the client-side.
We welcome your proposal and happy to discuss it further.
Hi @HamidShojanazeri and @msaroufim thanks for taking an interest in this issue. Personally I'm working with sequential images from a video. Would you have any idea how we could sign an image?
Furthermore, I'm not sure I understand what signing achieves. If you consider three different streams of images, lets say some images are signed "Stream = A", "Stream = B" and "Stream = C".
Personally also I think aggregation of inference should stay on the server-side. Consider the scenario when you are working with a CPU only machine and sending sequential video frames to a remote GPU enabled TorchServer. Ideally you would simply want images to be sent to the server and receive back JSON inference results. For example in object tracking you would only want to receive back what objects are in the image, their identity and where they are now located compared to the previous frame. Managing this on torchserve side greatly reduces the compute requirements on the client.
This issue is applicable to any sequential data but I'd like to focus on tracking by detection to narrow the scope of what we're discussing:
@ed-muthiah I see,I think I missed the part that you need the previous frames predictions contributing to the next frame. My suggestion was a to find a work around to associate the responses from stream A, being served on different workers.
As all the workers serving the same model, so the idea of signature just make sure that the inference resulted from any worker can be associated back to the right data stream, it originated from. For example, if you have two video frames from stream A coming in different batches(or in a mixed batch of data from Stream A and B) , one frame from Stream A goes to worker1 and the other one goes to worker2, then the returned response is recognizable in respect with original stream.
For sure, it's ideal to persist it on the server side, where in this setting would effect the handler time. With post-processing, on client side what I meant was aggregating predictions related to each frame of a video, which I think in this case might not help.
CC @maaquib
Gotcha, now I get what you mean. But yes, as you said, the sequential nature of this makes it a bit tougher.
A few comments on what I think we need to set up a working example:
wait until
next frame is received. This would essentially keep the handler running in a loop processing frames as the coming in and outputting inference results. There might be a few flaws in my understanding of TorchServe, as I'm not sure if the handler script must run fully to completion so that TorchServe actually outputs anything so please so please let me know :)
@HamidShojanazeri @msaroufim @maaquib
What about an in memory database like redis or memcached? Should work in principle similarly to writing to a file without the slowdown from reading and writing to disk.
This could work, I'll investigate a bit more.
@msaroufim Looks like Redis cannot be used in a stream processing or ML engine, as per the license agreement: https://redislabs.com/wp-content/uploads/2019/09/redis-source-available-license.pdf
Could someone please confirm if TorchServe is considered a "(f) machine learning or deep learning or artificial intelligence serving engine;"
Is your feature request related to a problem? Please describe.
I'm looking to pass multiple feeds of sequential data from local instances to torchserve running on a remote instance. The current handlers such as the object detector seem to distribute the incoming requests to different workers and run the handler but consider the incoming data to be independent of previous data. I'm looking to understand how I can create a persistent handler.
Describe the solution
For example, say I have sequential data streams A, B and C posting data to the torchserver. I would like to persist information from the results of the previous handler call and update it at every timestep as new data comes in sequentially. Each worker must only receive information from the stream (A, B or C) that it had previously received an request from.
A use case for this is speech-to-text or object tracking. Where new inputs words or images relate to prior inputs.
@harshbafna @HamidShojanazeri