tailhook / zerogw

A fast HTTP/WebSocket to zeromq gateway (UNMAINTAINED, take a look at swindon web server instead)
http://zerogw.com
MIT License
250 stars 27 forks source link

measure response time #31

Open timurb opened 9 years ago

timurb commented 9 years ago

It would be nice if there was a stats counter for some kind of response time, for example, average response time since the last time stats were taken.

tailhook commented 9 years ago

While it's possible to sum up total time of requests (so you can divide, and get average). Before you can get useful value you should answer the following questions:

  1. Do requests that timed out commit to the average? (Timeouts fall into two cases: (a) transient network failures, that TCP doesn't able to recover fast, (b) backend slowness which usually hangs more than timeout; both cases do not add meaningful value to result, on the other side client is waiting all this time)
  2. Do the average really meaningful enough to you? (Statistically median or other pecentile is often more useful)
  3. Do static requests count? (they are usually very fast if sent from cache, and very slow otherwise)
  4. Do your requests tends have same mean time? (usually there are very fast routes, such as GET /health_check and very slow such as POST /admin/whatever/save)

Another thing to consider is Websockets. I believe many users use zerogw specifically for websockets. Zerogw processes websocket messages as publish-subscribe. While many applications do emulate request-reply on top of pub-sub, zerogw doesn't know anything about requests, so can't account that.

At the end of the day, we usually account request latency either at the backend, where we can differentiate multiple types of requests, and so have summary of backend performance on different routes. Or client side, i.e. in javascript, to see end-user performance, so we can adjust network latency issues.

Zerogw's attribution to latency very small unless it uses ~100% CPU (which is at point of about 30-100 thousands requests per second depending on CPU) or machine's load-average is too high. In that case just run many instances of zerogw.