Open eseliger opened 3 years ago
Adding to this: from a batch change user standpoint, as soon as a user starts running large-scale complex jobs (think a AST-based tool that requires the JVM) over hundreds of repositories, they will want to know what is the bottleneck for execution speed (CPU, network, memory, etc).
Pasting this link here so we don't forget, we will likely need the pushgateway to capture firecracker VM metrics
Heads up @macraig - the "team/code-intelligence" label was applied to this issue.
We at some point want to be able to track resource usage per VM, not just per executor compute instance. Therefore, we probably want to run a node_exporter inside the VM and scrape the data from there and forward it in some way. Ideally, this would not only be available in Prometheus/Grafana in the end, so we can show it to users as well. This is to drill down on performance problems to find out whether CPU or memory are the bottlenecks. Also, this will help us make more informed decisions about resource allocations.