Finer grain utilization measurement

LedgeDash commented 5 years ago

Currently, the stats we collect from controller is high-level coarse-grain stats (number of requests completed and dropped). Additionally, we need finer-grain time measurement. In particular, measurements should include:

scheduling decision latency
VM boot latency
eviction latency
communication latency

With those time measurements, we can then calculate system utilization

tan-yue commented 5 years ago

@LedgeDash would you mind writing down here the formula of system utilization in terms of the four latencies?

LedgeDash commented 5 years ago

This is something I want to discuss with y'all. One way we could measure this is:

Measure total time from when the 1st request is scheduled to when all requests finish
For every VM ever created, output its boot latency
For every VM evicted, output its eviction latency

I think all of these can be done reasonably easily. But I'm not sure how to measure communication latency (i.e., from the point when a VM is booted to when the application receives the request json from tty).

In terms of formula, I think it would look sometime like this: utilization = (resources running app code x time running app code) / (total resources x total time) to get the numerator, we would keep track of each vm, measuring its boot latency, eviction latency and then time running app code for that vm equals total time - boot latency x number of boots - eviction latency x number of evictions.

The scheduling latency is not part of the formula (sorry I should have been more clear). But it gives us a lower bound on inter-arrival time, i.e., if scheduling latency is 20ms, then interarrival time is at least 20ms between requests. This affects how we pick applications. In the 20ms example, we don't want many applications that run shorter than 20ms because all they need is only one VM each to service all requests.

alevy commented 5 years ago

We should be able to measure time_running_app_code nearly directly by measuring the time between sending the request to the vm over the pipe and printing out the result. For each application, we would expect this to be very close with and without snapshotting.

Alternatively, we could modify the runtime libraries to compute the time running requests directly inside the vm, and append it to the JSON response.

alevy commented 5 years ago

The scheduling latency is not part of the formula (sorry I should have been more clear). But it gives us a lower bound on inter-arrival time, i.e., if scheduling latency is 20ms, then interarrival time is at least 20ms between requests. This affects how we pick applications. In the 20ms example, we don't want many applications that run shorter than 20ms because all they need is only one VM each to service all requests.

I don't understand this ^

LedgeDash commented 5 years ago

I was trying to say 2 things.

Scheduling latency is not part of the utilization formula
With a single threaded scheduler (what we have) reading requests from a file, the scheduling latency is the lower bound on request inter-arrival time. If scheduling latency is 20ms and actual request interval is 1ms, then requests will be hitting the cluster with 21ms inter-arrival time. Moreover, this also means that if a function takes only 10ms (<21ms) to run, at equilibrium it will only need 1 VM. So if all our function are shorter than scheduling latency, we'll need many different functions in order to saturate cluster resources. That's all

princeton-sns / firecracker-tools

Finer grain utilization measurement #23