Describe the bug
The current cluster configuration has been well tested on a NVIDIA V100 GPU and on a typical segmentation workflow. However, depending on the model and the hardware used in future clusters, there are a few settings that may need to be tweaked.
tf-serving
MAX_BATCH_SIZE: The maximum number of batches that tf-serving will process in a given duty cycle. If the job is using very large input tensors, this batch size may need to be decreased
MAX_ENQUEUED_BATCHES: The number of batches that will sit in the work queue waiting to be processed. If the requests have a very large payload tf-serving to be evicted due to memory issues, and this parameter should be decreased.
redis-consumer
TF_MAX_BATCH_SIZE: The number of batches to send to the model server. This value MUST be less than or equal to MAX_BATCH_SIZE above and may need to be altered for future workflows.
GRPC_TIMEOUT: The length of time to wait for a gRPC inference request. If a model's inference time is quite slow, this may need to be adjusted to prevent timeouts.
Additional context
For more notes on the interplay between these settings and the hardware itself, please review this related issue.
Describe the bug The current cluster configuration has been well tested on a NVIDIA V100 GPU and on a typical segmentation workflow. However, depending on the model and the hardware used in future clusters, there are a few settings that may need to be tweaked.
tf-serving
MAX_BATCH_SIZE
: The maximum number of batches thattf-serving
will process in a given duty cycle. If the job is using very large input tensors, this batch size may need to be decreasedMAX_ENQUEUED_BATCHES
: The number of batches that will sit in the work queue waiting to be processed. If the requests have a very large payloadtf-serving
to be evicted due to memory issues, and this parameter should be decreased.redis-consumer
TF_MAX_BATCH_SIZE
: The number of batches to send to the model server. This value MUST be less than or equal toMAX_BATCH_SIZE
above and may need to be altered for future workflows.GRPC_TIMEOUT
: The length of time to wait for a gRPC inference request. If a model's inference time is quite slow, this may need to be adjusted to prevent timeouts.Additional context For more notes on the interplay between these settings and the hardware itself, please review this related issue.