Closed tiborsimko closed 2 years ago
I would prefer to store data in CSV format (or JSON) to be able to connect it with collected_results.csv
later.
After some investigation, it looks like more thoughts will be needed on how monitored data should be structured and saved.
monitored_date,db_connections_number
2021-11-16T10:24:12,15
2021-11-16T10:25:12,20
If we want to add workflow statuses, it gets a bit more complicated:
monitored_date,db_connections_number,status,count
2021-11-16T10:24:12,15,running,5
2021-11-16T10:24:12,15,pending,2
2021-11-16T10:25:12,20,running,6
2021-11-16T10:25:12,20,pending,1
If we want to add pod statuses, it gets even more complicated:
monitored_date,db_connections_number,status,count,type,type_count
2021-11-16T10:24:12,15,running,5,run-b,5
2021-11-16T10:24:12,15,running,5,run-j,2
...
Splitting into multiple CSV files can help. But will introduce more complexity in analyze
to merge them together.
{
"2021-11-16T10:24:12": {
"db_connection_number": 15,
"workflow_statuses": {
"running": 5,
"pending": 2
}
}
}
This is a more flexible approach. It is also possible to extend the file with new metrics by just adding a new entry under the 2021-11-16T10:24:12
key. In the analyze
command, I can just use key (date) to plot metrics.
P.S While writing my findings, I realized that JSON looks like a good idea. Writing stuff down helps a lot :)
P.S P.S This whole problem with how to save data is a classical "structured vs non-structured data" debate.
suggestion: The point of this issue is to develop a monitor
command only. I will add another issue that will focus on how the analyze
command will use monitored data and plot it alongside what we have already.
Another thing, I will use subprocess
to execute commands and parse the output. Maybe, it is not as effective as using some API (like a python-k8s library) but it is simpler to start. We can improve later if needed.
(stems from https://github.com/reanahub/reana/pull/541#discussion_r716529942)
Current behaviour
Currently, while running benchmarking script, one can monitor the DB status and the K8S status independently via a script like:
This gives output as follows, for one particular moment of time:
and, 30 seconds later:
These time snapshots allow to monitor the number of DB connections, the DB statuses vs K8S statuses, the number of "Running" pods vs "Pending" pods, to see how fast the pods terminate, etc, giving a complementary picture of what's happening in the cluster.
The trouble is that this "side" monitoring is a bit "detached" from the main content output of the benchmark scripts. It'll be advantageous to better correlate this information with the workflow burn down plots.
Expected behaviour
We can introduce a new command
monitor --sleep 30
which would do the above automatically and which would collect the information in either the textual format above (MVP), or even better in a CSV format that will allow to plot nice DB and K8S status evolution graphs later about how the measured DB and K8S quantities evolve as a function of time.For example, once #573 is implemented, we shall have a "real time arrow" representation of the workflow burn down in the cluster, and the DB info plots and K8S info plots will nicely complement the overall picture about what's happening in the cluster.
They might show graphical insight into "orange hill" and "blue spread" phenomena, such as the transition from "Running -> NotReady -> Terminating" status of workflow pods.