utoronto-2i2c / jupyterhub-deploy

Demo JupyterHub deployment for University of Toronto
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Cost Optimization of JupyterHub Service #92

Open VijayMaraviya opened 3 years ago

VijayMaraviya commented 3 years ago

Hi @yuvipanda,

[Continuing our discussion from Team last week.... ] Broadly, the idea is to simulate (via Discrete Event Simulation) the cluster operation to find optimal system parameters (Node type, guarantee ratio, etc.) or optimal policies (how to configure node pools, etc. to exploit the pattern in user behavior) to minimize the cost while providing certain performance guarantee.

Right now, I am learning about Prometheus and Grafana. I couldn't see the explore option on the Grafana. Could please help me with how to get the data. I am looking for data of users' arrival and departure from the system.

yuvipanda commented 3 years ago

Thanks for opening this, @VijayMaraviya.

I am looking for data of users' arrival and departure from the system.

Grafana and Prometheus usually collect time series aggregate data - so you can ask for things like 'at this point of time, how many users were on the system?' rather than 'what happened at this minute'? Reading https://thenewstack.io/what-is-the-difference-between-metrics-and-events/ might help clarify the difference between metrics (which is what grafana has) vs events (which is what will provide the data you are looking for). If this is the specific dataset you want, I can look at the logs and produce that for you.

For optimization, I'd suggest reading up on the current work being done before starting. Some prior reading that might be useful:

I hope that getting an awareness of the current state of implementable solutions helps form your research questions.