So that asnc ephermal tasks can push results to prometheus without conflict. data is being lost and i dont know why
“Nice to have” is not a good use case. :)
Bug Report
What did you do?ich circumstances?**
I have a flask endpoint that is responsible for launching the following celery task:
@celery_app.task(name="monitor_metrics", bind=True, base=AbortableTask)
def monitor_metrics(self, vnf_name, vnf_ip, vnf_user, vnf_pass, suite_id):
push_gateway = f"{PROMETHEUS_PUSHGATEWAY}:{PROMETHEUS_PUSHGATEWAY_PORT}"
# Setup Prometheus Gauge
registry = CollectorRegistry()
gauge = Gauge(f'infra_health_manager', # Metric name my_guage
f'A custom gauge for capturing VNF CPU run by thyme-infra-health-manager', # comment description of gauge
['instance', 'cpu'], registry=registry)
# Initialize SSH Session
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(vnf_ip, username=vnf_user, password=vnf_pass)
# Infinite loop to get and push metrics to prometheus
while not self.is_aborted():
try: # Try and Use existing SSH Session
stdin, stdout, stderr = ssh.exec_command('top -b -n 1')
output = stdout.read().decode('utf-8')
except Exception as error:
print(f"ERROR {error}")
print(error)
# Renitialize SSH Session
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(vnf_ip, username=vnf_user, password=vnf_pass)
stdin, stdout, stderr = ssh.exec_command('top -b -n 1')
output = stdout.read().decode('utf-8')
# Parse the output using regex
cpu_pattern = re.compile(r"%Cpu(\d+)\s+:\s+\d+\.\d+/\d+\.\d+\s+(\d+)")
matches = cpu_pattern.findall(output)
cpu_usage = {f"Cpu{match[0]}": int(match[1]) for match in matches} # Store the result in a dictionary
print(f"{vnf_name} CPU_USAGE: {cpu_usage}")
for cpu in cpu_usage:
gauge.labels(vnf_name, cpu).set(cpu_usage[cpu])
push_to_gateway(push_gateway, job=suite_id, registry=registry)
time.sleep(VNF_MONITOR_FREQUENCY)
logger.info(f"Stopping {vnf_name} CPU Monitor")
return True
Each task is supposed to monitor a unique vm and get the cpu data. due to certain requirements this is the only way to get the cpu data out of the vm.
The metrics appear perfectly in prometheus when there is only one task running. when a second task is launched the metrics that are stored in prometheus are very spotty. There appears to be some conflict but I am not clear on what that is.
Looking at the logs on my celery task I can see that the ssh command is succeeding and it getting the correct CPU numbers. I have looking into grouping_key, job, and instance documentation, but its very poor. I have tried a few changes but with no success. for a little more backround the job, represents a unique id of a given testing cycle, the instance is the name of a VM, and then each instance may have 2-8 cpus.
Feature request
So that asnc ephermal tasks can push results to prometheus without conflict. data is being lost and i dont know why
“Nice to have” is not a good use case. :)
Bug Report
What did you do?ich circumstances?**
I have a flask endpoint that is responsible for launching the following celery task:
Each task is supposed to monitor a unique vm and get the cpu data. due to certain requirements this is the only way to get the cpu data out of the vm.
The metrics appear perfectly in prometheus when there is only one task running. when a second task is launched the metrics that are stored in prometheus are very spotty. There appears to be some conflict but I am not clear on what that is.
Looking at the logs on my celery task I can see that the ssh command is succeeding and it getting the correct CPU numbers. I have looking into grouping_key, job, and instance documentation, but its very poor. I have tried a few changes but with no success. for a little more backround the job, represents a unique id of a given testing cycle, the instance is the name of a VM, and then each instance may have 2-8 cpus.
Environment
System information:
Pushgateway version:
Pushgateway command line:
Logs: