switching operation order in notify_handle, so that generation status is set before sending out last token, so user will never get status RUNNING when generation is already done and no new tokens will come
add simple metrics giving some basic information about pipeline state like number of all requests, number of running request, cache usage etc.
Changes: