pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.04k stars 821 forks source link

How to send a torch array via request #3195

Closed lschaupp closed 1 week ago

lschaupp commented 1 week ago

I want to send a torch (cuda) array via python request to the inference API. Is that possible?

agunapal commented 1 week ago

AFAIK you can't send it as is. It needs to be serialized.

lschaupp commented 1 week ago

@agunapal Would it be possible to have some kind of pinned memory where the torch arrays are loaded, and then simply sharing the memory pointer via the request? I have larger image files to handle - and trying to find the most efficient way on systems with low CPU performance.

agunapal commented 1 week ago

If you are doing this locally, should be possible using shared memory. However, if you have a cuda tensor, I don't think it works. Atleast, it didn't work previously

lschaupp commented 1 week ago

If you are doing this locally, should be possible using shared memory. However, if you have a cuda tensor, I don't think it works. Atleast, it didn't work previously

Thanks for the info. It could work with tensors on shared memory ("cpu") based on your response. That would be massively better than sending the file across imho. Do we have any working example?

Edit: It is a local instance :)