Open toretak opened 3 years ago
@toretak you need to register the model with the target batch size, this can be done either through management API for example
torchserve --start --model-store model_store
curl -X POST "localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/resnet-152-batch_v2.mar&batch_size=3&max_batch_delay=10&initial_workers=1"
or through config.properties if you are using latest version, as indicated here.
Thanks for reply @HamidShojanazeri, ts is started with config.properties
torchserve --start --model-store model-store --ts-config /tmp/config.properties
config.properties file:
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
job_queue_size=100
load_models=u2net.mar
models={\
"u2net": {\
"1.0": {\
"defaultVersion": true,\
"marName": "u2net.mar",\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 8,\
"maxBatchDelay": 1000,\
"responseTimeout": 120\
}\
}\
}
I'v tried different configurations of maxBatchDelay
(500, 1000, 5000) and there is no effect.
Can you print the the number of inferences being generated in the inference handler and check the logs/model_log.log
for it? There should be as many inferences as your batch size
Also the BERT example is a good one to see how batching works - you are going to be giving a single batch to the inference function but you want to read back out an inference for each example and return it
Also @toretak, I've noticed you're super active on torchserve so if you'd be interested in chatting over Zoom I'd love to set something up to go over your feedback and how you're using torchserve - my email is firstnamelastname@fb.com
Hi @msaroufim, thanks for your reply,
I have added debug print into inference
and preprocess
methods in custom handler and model_log.log says:
2021-10-04 15:41:58,086 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2021-10-04 15:41:58,086 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - [PID]32
2021-10-04 15:41:58,086 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Torch worker started.
2021-10-04 15:41:58,086 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Python runtime: 3.6.9
2021-10-04 15:41:58,097 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-04 15:41:58,118 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-04 15:42:26,638 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-04 15:42:26,771 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-04 15:42:26,804 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
2021-10-04 15:42:26,805 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-10-04 15:42:26,810 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3487: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
2021-10-04 15:42:26,810 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
2021-10-04 15:42:26,810 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3613: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
2021-10-04 15:42:26,810 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - "See the documentation of nn.Upsample for details.".format(mode)
2021-10-04 15:42:27,267 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1805: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
2021-10-04 15:42:27,268 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2021-10-04 15:42:27,273 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-04 15:42:27,273 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-04 15:42:27,273 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
2021-10-04 15:42:27,611 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-04 15:42:27,675 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-04 15:42:28,195 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-04 15:42:28,195 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-04 15:42:28,195 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-10-04 15:42:28,469 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-04 15:42:28,544 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-04 15:42:29,061 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-04 15:42:29,061 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-04 15:42:29,061 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (770, 595)
config.properties is still the same
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
job_queue_size=100
load_models=u2net.mar
models={\
"u2net": {\
"1.0": {\
"defaultVersion": true,\
"marName": "u2net.mar",\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 8,\
"maxBatchDelay": 1000,\
"responseTimeout": 120\
}\
}\
}
according to log, it looks that TS really did three different inferences instead of one batched...
I read https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/Transformer_handler_generalized.py but I don't see anything special here. Am I missing something?
Thanks a lot...
Ok cool so the most likely culprit seems to be the batch delay, I see you have images of size (3000,2000) and (1280,720) so just as a sanity check try 10-100x the batch delay and see if the whole batch is now processed. Also remove the response time out for this test.
In transformer handler generalized I just wanted you to see how tensors are cat
'ed before being passed to the inference handler https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/Transformer_handler_generalized.py#L156
so I've tested this config
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
job_queue_size=100
load_models=u2net.mar
models={\
"u2net": {\
"1.0": {\
"defaultVersion": true,\
"marName": "u2net.mar",\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 8,\
"maxBatchDelay": 100000\
}\
}\
}
and it waited 100s between every image
model-server@6e6a671cfdf9:~$ cat logs/model_log.log
2021-10-05 11:05:06,584 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - [PID]32
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Torch worker started.
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Python runtime: 3.6.9
2021-10-05 11:05:06,595 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-05 11:05:06,618 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-05 11:07:07,358 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-05 11:07:07,480 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-05 11:07:07,519 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
2021-10-05 11:07:07,520 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3487: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3613: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - "See the documentation of nn.Upsample for details.".format(mode)
2021-10-05 11:07:07,963 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1805: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
2021-10-05 11:07:07,963 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
model-server@6e6a671cfdf9:~$ cat logs/model_log.log
2021-10-05 11:05:06,584 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - [PID]32
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Torch worker started.
2021-10-05 11:05:06,585 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Python runtime: 3.6.9
2021-10-05 11:05:06,595 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-05 11:05:06,618 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-05 11:07:07,358 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-05 11:07:07,480 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-05 11:07:07,519 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
2021-10-05 11:07:07,520 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3487: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3613: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
2021-10-05 11:07:07,525 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - "See the documentation of nn.Upsample for details.".format(mode)
2021-10-05 11:07:07,963 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1805: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
2021-10-05 11:07:07,963 [WARN ] W-9000-u2net_1.0-stderr MODEL_LOG - warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-05 11:07:07,964 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
2021-10-05 11:08:48,038 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-05 11:08:48,062 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-05 11:08:48,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-05 11:08:48,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-05 11:08:48,542 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-10-05 11:10:28,566 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-05 11:10:28,580 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-05 11:10:29,178 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (1, 320, 320)
2021-10-05 11:10:29,178 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-05 11:10:29,179 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (770, 595)
There must be an issue in handler I guess ... I am trying to load all incoming images to list https://github.com/Biano-AI/TorchServe-u2net-handler/blob/master/src/custom_handler.py#L135
and then normalize all loaded images and convert them to tensors in preprocessing https://github.com/Biano-AI/TorchServe-u2net-handler/blob/master/src/custom_handler.py#L67-L78
Thanks a lot for your time @msaroufim !
How are you sending requests to torchserve? Are you using the request
library by any change? In which case the call will be synchronous
Try sending 2 curl statements with &
between them to send 2 parallel requests to see if the issue goes away
It feels like your handler is fine because predict_np shape is only getting one image at a time
@msaroufim you are absolutely right ... I was sending requests using this curl all the time
curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg,boat.jpg,horse.jpg}"
and that was the issue
when are requests sent like this
curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{bike.jpg}" & curl -X POST http://127.0.0.1:8080/predictions/u2net -T "{boat.jpg}"
batching works!
...
2021-10-08 07:29:12,653 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-10-08 07:29:12,674 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - model_name: u2net, batchSize: 8
2021-10-08 07:29:20,237 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === preprocess called ===
2021-10-08 07:29:20,377 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - === inference in handler called ===
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - predict_np shape (2, 320, 320)
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,382 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (1280, 720)
2021-10-08 07:29:21,388 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - postprocessing image 0
2021-10-08 07:29:21,389 [INFO ] W-9000-u2net_1.0-stdout MODEL_LOG - (3000, 2000)
so thank you very much! It's a quite surprise for me ...
This should really be added to the batch inferencing documentation as the example there only shows how to run 1 image. I was pretty confused until I stumbled on this issue.
https://pytorch.org/serve/batch_inference_with_ts.html
""" Run inference to test the model.
$ curl http://localhost:8080/predictions/resnet-152-batch_v2 -T kitten.jpg { "tiger_cat": 0.5848360657691956, "tabby": 0.3782736361026764, "Egyptian_cat": 0.03441936895251274, "lynx": 0.0005633446853607893, "quilt": 0.0002698268508538604 } """
Agreed lemme reopen this to keep track
@toretak were you able to get some python asyncio code working asynchronously with Torchserve API?
@Vert53 Hi, we actually doesn't need python asyncio in TS handler directly, so I didn't test it. In fact I can't imagine use case for it .. but it should work, I suppose. Do you have some (probably not working) implementation?
Hi @toretak what I meant Is to use asyncio for the requesting not the serving (handler). I managed to write this async code to test how fast torchserve worked on my setup using imagenet dataset. Sharing it in case it is of any use.
import json
import time
import aiohttp
import asyncio
import aiofiles
from torchvision.datasets import ImageFolder
from aiofiles.threadpool import AsyncBufferedReader
from typing import Tuple
valdir = '/pytorch/imagenet/ILSVRC2012/val'
index_to_name_path = 'index_to_name.json' # mapping {'791': ['n04204347', 'shopping_cart'],.....}
class ImageNetLoader:
def __init__(self,
folder: ImageFolder):
self.folder = folder
self.iter_samples = iter(folder.samples)
def __aiter__(self):
return self
async def __anext__(self) -> Tuple[AsyncBufferedReader, int]:
try:
sample_path, target = next(self.iter_samples)
except StopIteration:
raise StopAsyncIteration
async with aiofiles.open(sample_path, 'rb') as sample_file:
sample = await sample_file.read()
return sample, target
async def infer_request(session: aiohttp.ClientSession,
url: str,
sample: AsyncBufferedReader,
target: int,
queue: asyncio.Queue
) -> None:
async with session.post(url, data=sample) as response:
if response.status == 200:
output = await response.text()
await queue.put((output, target))
async def inference_session(url,
loader,
queue
) -> None:
async with aiohttp.ClientSession() as session:
infers = []
async for sample, target in loader:
infers.append(asyncio.create_task(infer_request(session=session,
url=url,
sample=sample,
target=target,
queue=queue)))
await asyncio.gather(*infers, return_exceptions=True
class ImagenetPostProcessor:
def __init__(self):
self.correct_predictions = 0
self.total_predictions = 0
async def postprocess_results(self,
queue: asyncio.Queue,
index_to_name: dict
) -> None:
while True:
output, target = await queue.get()
top_1_prediction = next(iter(json.loads(output).keys()))
target_str = index_to_name[str(target)][1]
if top_1_prediction == target_str:
self.correct_predictions += 1
self.total_predictions += 1
queue.task_done()
async def main(url, loader):
postp = ImagenetPostProcessor()
with open(index_to_name_path) as f:
index_to_name = json.load(f)
queue = asyncio.Queue()
producer = asyncio.create_task(inference_session(url, loader, queue))
consumer = asyncio.create_task(postp.postprocess_results(queue, index_to_name),)
await producer
await queue.join()
consumer.cancel()
print(f'total pred {postp.total_predictions}')
return postp.correct_predictions/postp.total_predictions
if __name__ == '__main__':
imagenet_folder = ImageFolder(valdir)
imagenet_loader = ImageNetLoader(imagenet_folder)
a = time.time()
correct = asyncio.run(
main('http://localhost:8080/predictions/resnet50', imagenet_loader)
)
b = time.time()
print(b - a)
print(correct)
Here's an example of sending async batched requests using python https://github.com/pytorch/serve/tree/master/examples/image_classifier/near_real_time_video
I'v been looking in previous issues, but I could not find satisfying answer.
I have packed model using model-archiver in docker.
Than I run model in docker.
Than I will call model multiple times
or from python
but in TS log I can see, that requests are processed sequentially.
Context
We would like to batch multiple requests and do the inference just once for more requests.
Your Environment
There is a full repository to reproduce https://github.com/Biano-AI/TorchServe-u2net-handler
Custom handler
Expected Behavior
I understand from documentation, that TS should be able to aggregate multiple requests and call model just once. If not, please aplogoze...
Thanks