Cost Optimization of JupyterHub Service

Thanks for opening this, @VijayMaraviya.

I am looking for data of users' arrival and departure from the system.

Grafana and Prometheus usually collect time series aggregate data - so you can ask for things like 'at this point of time, how many users were on the system?' rather than 'what happened at this minute'? Reading https://thenewstack.io/what-is-the-difference-between-metrics-and-events/ might help clarify the difference between metrics (which is what grafana has) vs events (which is what will provide the data you are looking for). If this is the specific dataset you want, I can look at the logs and produce that for you.

For optimization, I'd suggest reading up on the current work being done before starting. Some prior reading that might be useful:

https://github.com/kubernetes/autoscaler, which automatically sizes our cluster based on need. Configuring this is what ultimately optimizes cost, so understanding its limitations is vitally important to see what can be useful.
https://zero-to-jupyterhub.readthedocs.io/en/stable/administrator/optimization.html talks about how JupyterHub uses the autoscaler.
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/search?q=placeholder&type=issues lists current conversation about 'pod placeholders', which is what JupyterHub uses to signal the autoscaler. This is what's in our control to modify, and what the JupyterHub community is actively working on.
https://discourse.jupyter.org/t/request-for-implementation-jupyterhub-aware-kubernetes-cluster-autoscaler/7669/15 was the result of the last time I looked at deep optimization.

I hope that getting an awareness of the current state of implementable solutions helps form your research questions.

utoronto-2i2c / jupyterhub-deploy

Cost Optimization of JupyterHub Service #92