Monitor gRPC server queue size

Hi!

First, thanks for the contribution so far.

It is well known that the balance between max_workers and maximum_concurrent_rpcs, and number of pods/servers can be tricky. One important way to measure how this balance is doing, is by monitoring the work queue size in the thread pool. It would be a great addition to this lib.

I have hacked my way around it to do it for my use case, and I'm open to contribute here. What do you think? Follows the rough sketch of the code:

class ThreadPoolQueueMetricServerInterceptor(PromServerInterceptor):
    """This interceptor measures the size of the work queue in the
        gRPC server thread pool, by the end of an execution.
    """
    ...
    def _measure_queue_size(self):
        # From the current thread, find its thread pool, and fetch its current queue size.
        thread = current_thread()
        # A thread has target arguments as follows. We get the work queue.
        # executor_reference, work_queue, initializer, initargs
        _, work_queue, _, _ = thread._args
        self.queue_size_gauge.set(work_queue.qsize())

    def intercept_service(self, continuation, handler_call_details):

        def wrapper(behavior_fn, *args):
            def new_behavior(request, context):
                # Call original behaviour
                response_or_iterator = behavior_fn(request, context)

                self._measure_queue_size()

                return response_or_iterator

            return new_behavior

        return self._wrap_rpc_behavior(continuation(handler_call_details), wrapper)

lchenn / py-grpc-prometheus

Monitor gRPC server queue size #36