Open hudengjunai opened 1 year ago
Hello, I have launched the opt-125M inference, and send request to server with locust. but what ever config the max_batch_size, the InferenceEngine always run in batch_size =1. how can i use the dynamic batch feature in Batch_server_manager?
It will make batch only when a sequence of input can be batched, e.g. when they have same generation steps.
Hello, I have launched the opt-125M inference, and send request to server with locust. but what ever config the max_batch_size, the InferenceEngine always run in batch_size =1. how can i use the dynamic batch feature in Batch_server_manager?