usegalaxy-eu / tpv-metascheduler-api

Metascheduler for TPV as Service
MIT License
0 stars 5 forks source link

Additional stats for tpv-api #10

Closed pauldg closed 3 months ago

pauldg commented 6 months ago

As mostly discussed in #6 we want to add additional stats to be collected by the api (and not using TPV)

For this to work we need:

The metrics we would want/need:

Static information of the job:

This information is directly available in TPV

Dynamic information about the job, expressed as the combination of tool and destination (frequency: daily):

$ gxadmin query destination-queue-run-time --seconds --older-than=90

destination_id | tool_id | count | avg | min | median_queue | perc_95_queue | perc_99_queue | max | avg | min
| median_run | perc_95_run | perc_99_run | max
----------------+-----------------+-------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-------------- ---+-----------------+-----------------+-----------------+----------------- 
condor_tpv | Show beginning1 | 4 | 00:00:42.190985 | 00:00:41.921395 | 00:00:42.197197 | 00:00:42.419296 | 00:00:42.44238 | 00:00:42.448151 | 00:00:15.742914 | 00:00:12.2020 32 | 00:00:14.23126 | 00:00:21.398603 | 00:00:22.125406 | 00:00:22.307107 
pulsar_be_tpv | Show beginning1 | 30 | 00:00:03.752367 | 00:00:00.828691 | 00:00:01.227039 | 00:00:15.833112 | 00:00:19.567223 | 00:00:20.822238 | 00:00:12.835613 | 00:00:07.3436 93 | 00:00:08.121174 | 00:00:39.547216 | 00:00:47.76481 | 00:00:48.391205 | 

Dynamic information about the destination (frequency: 30 mins):

$ gxadmin query queue --by destination

 destination_id | state  | destination_count
----------------+--------+-------------------
 pulsar_be_tpv  | queued |                 4
 pulsar_be_tpv  | running |                 6
 condor_tpv |  queued |                 8

Shell scripts that give an overview of the cluster allocation and availability in an influx compatible way.

Note: We need a plan for configuring the remote/Pulsar destinations to ship the data to the InfluxDB. The EU could bake the Pulsar images with the required credentials and scripts to push the data to the EU's InfluxDB. This way, we do not have to establish a dedicated resource for this and can use what the EU already has.