mesos / mesos_exporter

Prometheus Mesos Exporter
Apache License 2.0
104 stars 61 forks source link

Added a 'hostname' label to all master metrics. #70

Closed klueska closed 6 years ago

klueska commented 6 years ago

Example output (filtering out the Descs):

mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="mean"} 0.011008
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="min"} 0.007168
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p50"} 0.011776
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p90"} 0.013056
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p95"} 0.01408
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p99"} 0.01796096
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p999"} 0.0409699839999998
mesos_master_allocation_run_latency_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p9999"} 0.0499465984000007
mesos_master_allocation_run_latency_ms_count{event="allocation",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1000
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="max"} 0.240128
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="mean"} 0.131072
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="min"} 0.103936
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p50"} 0.132096
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p90"} 0.155136
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p95"} 0.163072
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p99"} 0.19789824
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p999"} 0.231944192
mesos_master_allocation_run_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p9999"} 0.239309619200001
mesos_master_allocation_run_ms_count{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 22754
mesos_master_allocation_runs{event="allocation",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 22754
mesos_master_allocator_event_queue_dispatches{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_allocator_offer_filters_active{hostname="ip-10-0-6-97.us-west-2.compute.internal",role="*"} 1
mesos_master_allocator_offer_filters_active{hostname="ip-10-0-6-97.us-west-2.compute.internal",role="slave_public"} 1
mesos_master_allocator_resources_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="offered_or_allocated"} 0
mesos_master_allocator_resources_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 4
mesos_master_allocator_resources_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="offered_or_allocated"} 0
mesos_master_allocator_resources_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 138004
mesos_master_allocator_resources_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="offered_or_allocated"} 0
mesos_master_allocator_resources_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 15026
mesos_master_allocator_role_shares_dominant{hostname="ip-10-0-6-97.us-west-2.compute.internal",role="*"} 0
mesos_master_allocator_role_shares_dominant{hostname="ip-10-0-6-97.us-west-2.compute.internal",role="slave_public"} 0
mesos_master_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 4
mesos_master_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 4
mesos_master_cpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_cpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 0
mesos_master_cpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_cpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 0
mesos_master_cpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 138004
mesos_master_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 138004
mesos_master_disk{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_disk_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 0
mesos_master_disk_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_disk_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 0
mesos_master_disk_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_elected{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_master_event_queue_dispatches{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_event_queue_length{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="dispatches"} 2
mesos_master_event_queue_length{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="http_request"} 0
mesos_master_event_queue_length{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="message"} 0
mesos_master_frameworks_messages{framework="dcos_marathon",hostname="ip-10-0-6-97.us-west-2.compute.internal",type="processed"} 1555
mesos_master_frameworks_messages{framework="dcos_marathon",hostname="ip-10-0-6-97.us-west-2.compute.internal",type="received"} 1555
mesos_master_frameworks_messages{framework="dcos_metronome",hostname="ip-10-0-6-97.us-west-2.compute.internal",type="processed"} 212
mesos_master_frameworks_messages{framework="dcos_metronome",hostname="ip-10-0-6-97.us-west-2.compute.internal",type="received"} 212
mesos_master_frameworks_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="connected_active"} 2
mesos_master_frameworks_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="connected_inactive"} 0
mesos_master_frameworks_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="disconnected_inactive"} 0
mesos_master_gpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 0
mesos_master_gpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_gpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 0
mesos_master_gpus{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_gpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 0
mesos_master_gpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_gpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 0
mesos_master_gpus_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 15026
mesos_master_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 15026
mesos_master_mem{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_mem_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="free"} 0
mesos_master_mem_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="percent"} 0
mesos_master_mem_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="total"} 0
mesos_master_mem_revocable{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="used"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="authenticate_messages"} 3
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="deactivate_framework"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="decline_offers"} 265
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="dropped_messages"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="executor_to_framework"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="exited_executor"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="framework_to_executor"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="kill_task"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="launch_tasks"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="reconcile_tasks"} 1499
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="register_framework"} 2
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="register_slave"} 1
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="reregister_framework"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="reregister_slave"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="resource_request"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="revive_offers"} 3
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="status_update"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="status_update_acknowledgement"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="suppress_offers"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="unregister_framework"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="unregister_slave"} 0
mesos_master_messages{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="update_slave"} 1
mesos_master_messages_outcomes_total{destination="executor",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="invalid",source="framework",type=""} 0
mesos_master_messages_outcomes_total{destination="executor",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="valid",source="framework",type=""} 0
mesos_master_messages_outcomes_total{destination="framework",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="invalid",source="executor",type=""} 0
mesos_master_messages_outcomes_total{destination="framework",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="invalid",source="slave",type="status_update"} 0
mesos_master_messages_outcomes_total{destination="framework",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="valid",source="executor",type=""} 0
mesos_master_messages_outcomes_total{destination="framework",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="valid",source="slave",type="status_update"} 0
mesos_master_messages_outcomes_total{destination="slave",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="invalid",source="framework",type="status_update"} 0
mesos_master_messages_outcomes_total{destination="slave",hostname="ip-10-0-6-97.us-west-2.compute.internal",outcome="valid",source="framework",type="status_update"} 0
mesos_master_offers_pending{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_recovery_slave_removal_events_total{event="removal",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_registration_events_total{event="register",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_master_slave_registration_events_total{event="reregister",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_removal_events_reasons{hostname="ip-10-0-6-97.us-west-2.compute.internal",reason="registered"} 0
mesos_master_slave_removal_events_reasons{hostname="ip-10-0-6-97.us-west-2.compute.internal",reason="unhealthy"} 0
mesos_master_slave_removal_events_reasons{hostname="ip-10-0-6-97.us-west-2.compute.internal",reason="unregistered"} 0
mesos_master_slave_removal_events_total{event="canceled",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_removal_events_total{event="completed",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_removal_events_total{event="died",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_removal_events_total{event="scheduled",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_unreachable_events_total{event="canceled",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_unreachable_events_total{event="completed",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slave_unreachable_events_total{event="scheduled",hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_master_slaves_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="connected_active"} 1
mesos_master_slaves_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="connected_inactive"} 0
mesos_master_slaves_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="disconnected_inactive"} 0
mesos_master_slaves_state{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="unreachable"} 0
mesos_master_task_states_current{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="running"} 0
mesos_master_task_states_current{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="staging"} 0
mesos_master_task_states_current{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="starting"} 0
mesos_master_task_states_current{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="unreachable"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="dropped"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="errored"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="failed"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="finished"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="gone"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="gone_by_operator"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="killed"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="killing"} 0
mesos_master_task_states_exit_total{hostname="ip-10-0-6-97.us-west-2.compute.internal",state="lost"} 0
mesos_master_uptime_seconds{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 22766.873231104
mesos_overlay_log_ensemble_size{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_overlay_log_recovered{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_registrar_log_ensemble_size{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_registrar_log_recovered{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 1
mesos_registrar_queued_operations{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 0
mesos_registrar_registry_size_bytes{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 408
mesos_registrar_state_fetch_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal"} 20.161024
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="max"} 6.902016
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="mean"} 4.739072
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="min"} 4.739072
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p50"} 5.820544
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p90"} 6.6857216
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p95"} 6.7938688
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p99"} 6.88038656
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p999"} 6.899853056
mesos_registrar_state_store_ms{hostname="ip-10-0-6-97.us-west-2.compute.internal",type="p9999"} 6.9017997056
philipnrmn commented 6 years ago

Fixes DCOS-35750 - mesos-exporter should tag metrics with the source host name

MrMarvin commented 6 years ago

I'm not sure if hostname is the best name for that tag, given our history of using that for the field mesos uses internally. Optimally it should be the same thing, however as our current default in DC/OS is to forcefully use internal IP v4 addresses as Mesos 'hostnames' (see i.e. https://github.com/mesosphere/soak-cluster-deployers/pull/11 as an approach to make use of real hostnames instead). What we did in the past for on everything soak cluster related is to introduce the new tag/label called host, which hold the machines hostname, and continue to ship its IPv4 in hostname, to not break backwards compatibility.

Customers expectations will be to see whatever shows up in the DC/OS UIs node panel, which was renamed from 'HOSTNAME' to simply 'Name' between 1.10 and 1.11, so if this is an intentional step towards a more clear naming, I'm all in for that! 👍

klueska commented 6 years ago

It should be easily changable to whatever we want to call it. I checked with @Victhar last night about the naming before pushing the PR, and he seemed OK with it. Just let me know what the preferred naming is and I'll update things.

lloesche commented 6 years ago

Just a quick note on this, if the purpose of the label is to filter by machine, the standard in Prometheus would be to use the instance label in your queries. Example: 100 - (avg by(instance) (irate(node_cpu{job="node",mode="idle"}[5m])) * 100)

There's also the __address__ label which the admin can rewrite to e.g. a custom host label using a rule like

  - source_labels: [__address__]
    separator: ;
    regex: (.*):.*
    target_label: host
    replacement: $1
    action: replace

So I'm not sure what the use case for a hostname label is. It seems redundant from a Prometheus admin point of view but I might be missing something.

StephanErb commented 6 years ago

I agree with @lloesche here, adding a metric like this is redundant.

If you really want it somehow, the more lightweight way would be to just specify it once and then only add via joins when needed (as described here https://www.robustperception.io/how-to-have-labels-for-machine-roles/ for a version example).

klueska commented 6 years ago

@Victhar what were the requirements here that prompted wanting the hostname as a label?

philipnrmn commented 6 years ago

I spent some time playing with this and it's true that instance is already present. I'm going to back this out of #69. To be clear, it never touched master.

victhar commented 6 years ago

@klueska I never requested it, not sure how I ended up to be a customer of this :) From my perspective, in general it is important to understand where the metric is coming from and have cluster_id (name) and hostname (or IP) present as a tag, I don't mind if it's IP or hostname per se, as long as it is clearly indicating source of the metrics.