Closed YongWookHa closed 2 years ago
Updated the issue.
I thought the handler returns the correct value only in the cuda:0
.
However, it turns out that alternating requests return the correct value. In a cycle of 2. Although I run the torchserve container with single gpu.
Here's an example.
x = 'test text'
url = f"http://localhost:9080/predictions/my-model"
emb_x = requests.post(url, data = x).json()
emb_x_1 = requests.post(url, data = x).json()
emb_x == emb_x_1 # False
emb_x_2 = requests.post(url, data = x).json()
emb_x == emb_x_2 # True
emb_x_3 = requests.post(url, data = x).json()
emb_x == emb_x_3 # False
emb_x_4 = requests.post(url, data = x).json()
emb_x == emb_x_4 # True
@YongWookHa Could you please clarify if your setup is single GPU or multiple GPU. Also, could you please share some details on the model ( some example of an open source model) so I can repro it.
Closing since there is no followup. Please re-open when you get a chance
🐛 Describe the bug
I am using 2 GPU. Torchserve inference returns correct values only for predictions run on cuda:0.
Error logs
x_emb_1 != x_emb_2 # True
Installation instructions
Model Packaing
My handler looks like this
config.properties
else : default setting
Versions
docker-hub:
pytorch/torchserve:0.6.0-gpu
Repro instructions
.
Possible Solution
.