openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
245 stars 92 forks source link

Metrics not working due to duplicated metrics ( "collected before with the same name and label values" ) #211

Closed patrickjahns closed 1 year ago

patrickjahns commented 1 year ago

What steps did you take and what happened:

Prometheus is no longer able to fetch metrics from a node. Checking the logs revelead the following errors in the logs

openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin E1116 23:44:35.640240       1 agent.go:147] error gathering metrics:10 error(s) occurred:
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "openebs_size_of_volume" { label:<name:"device" value:"dm-0" > label:<name:"volumename" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > gauge:<value:3.758096384e+11 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_total_size_bytes" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:3.758096384e+11 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_used_percent" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:0 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_permission" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:1 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_when_full" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:-1 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_health_status" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:0 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_raid_sync_action" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:-1 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_mda_total_size_bytes" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:0 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_mda_used_percent" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:0 > } was collected before with the same name and label values
openebs-lvm-lvm-localpv-node-rz9mk openebs-lvm-plugin * collected metric "lvm_lv_snap_percent" { label:<name:"active_status" value:"active" > label:<name:"device" value:"dm-0" > label:<name:"dm_path" value:"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281" > label:<name:"host" value:"k-prod-worker-1" > label:<name:"name" value:"pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"path" value:"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281" > label:<name:"pool" value:"" > label:<name:"segtype" value:"linear" > label:<name:"vg" value:"openebs-ssd-0" > gauge:<value:0 > } was collected before with the same name and label values

What did you expect to happen:

Scraping of prometheus metrics working without any issue.

Anything else you would like to add:

When the logical volume is split in several Segments (in this case 2), ListLVMLogicalVolume will return several entries. ( https://github.com/openebs/lvm-localpv/blob/develop/pkg/lvm/lvm_util.go#L854-L867 ) and since the prometheus collector only iterates over the results ( https://github.com/openebs/lvm-localpv/blob/develop/pkg/collector/lv_collector.go#L102 ) above error is shown.

# lvs --options "lv_all,vg_name,segtype" --reportformat "json"
  {
      "report": [
          {
              "lv": [
                  {"lv_uuid":"WD7z1X-4mGk-lH26-yQCH-vOmJ-8qDR-Qdw0EJ", "lv_name":"pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_full_name":"openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_path":"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_dm_path":"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281", "lv_parent":"", "lv_layout":"linear", "lv_role":"public", "lv_initial_image_sync":"", "lv_image_synced":"", "lv_merging":"", "lv_converting":"", "lv_allocation_policy":"inherit", "lv_allocation_locked":"", "lv_fixed_minor":"", "lv_skip_activation":"", "lv_when_full":"", "lv_active":"active", "lv_active_locally":"active locally", "lv_active_remotely":"", "lv_active_exclusively":"active exclusively", "lv_major":"-1", "lv_minor":"-1", "lv_read_ahead":"auto", "lv_size":"350.00g", "lv_metadata_size":"", "seg_count":"2", "origin":"", "origin_uuid":"", "origin_size":"", "lv_ancestors":"", "lv_full_ancestors":"", "lv_descendants":"", "lv_full_descendants":"", "raid_mismatch_count":"", "raid_sync_action":"", "raid_write_behind":"", "raid_min_recovery_rate":"", "raid_max_recovery_rate":"", "raidintegritymode":"", "raidintegrityblocksize":"-1", "integritymismatches":"", "move_pv":"", "move_pv_uuid":"", "convert_lv":"", "convert_lv_uuid":"", "mirror_log":"", "mirror_log_uuid":"", "data_lv":"", "data_lv_uuid":"", "metadata_lv":"", "metadata_lv_uuid":"", "pool_lv":"", "pool_lv_uuid":"", "lv_tags":"", "lv_profile":"", "lv_lockargs":"", "lv_time":"2022-11-16 23:42:33 +0000", "lv_time_removed":"", "lv_host":"k-prod-worker-1", "lv_modules":"", "lv_historical":"", "lv_kernel_major":"253", "lv_kernel_minor":"0", "lv_kernel_read_ahead":"128.00k", "lv_permissions":"writeable", "lv_suspended":"", "lv_live_table":"live table present", "lv_inactive_table":"", "lv_device_open":"open", "data_percent":"", "snap_percent":"", "metadata_percent":"", "copy_percent":"", "sync_percent":"", "cache_total_blocks":"", "cache_used_blocks":"", "cache_dirty_blocks":"", "cache_read_hits":"", "cache_read_misses":"", "cache_write_hits":"", "cache_write_misses":"", "kernel_cache_settings":"", "kernel_cache_policy":"", "kernel_metadata_format":"", "lv_health_status":"", "kernel_discards":"", "lv_check_needed":"unknown", "lv_merge_failed":"unknown", "lv_snapshot_invalid":"unknown", "vdo_operating_mode":"", "vdo_compression_state":"", "vdo_index_state":"", "vdo_used_size":"", "vdo_saving_percent":"", "writecache_total_blocks":"", "writecache_free_blocks":"", "writecache_writeback_blocks":"", "writecache_error":"", "lv_attr":"-wi-ao----", "vg_name":"openebs-ssd-0", "segtype":"linear"},
                  {"lv_uuid":"WD7z1X-4mGk-lH26-yQCH-vOmJ-8qDR-Qdw0EJ", "lv_name":"pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_full_name":"openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_path":"/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281", "lv_dm_path":"/dev/mapper/openebs--ssd--0-pvc--72f9517e--749e--4cd1--929f--8b456289d281", "lv_parent":"", "lv_layout":"linear", "lv_role":"public", "lv_initial_image_sync":"", "lv_image_synced":"", "lv_merging":"", "lv_converting":"", "lv_allocation_policy":"inherit", "lv_allocation_locked":"", "lv_fixed_minor":"", "lv_skip_activation":"", "lv_when_full":"", "lv_active":"active", "lv_active_locally":"active locally", "lv_active_remotely":"", "lv_active_exclusively":"active exclusively", "lv_major":"-1", "lv_minor":"-1", "lv_read_ahead":"auto", "lv_size":"350.00g", "lv_metadata_size":"", "seg_count":"2", "origin":"", "origin_uuid":"", "origin_size":"", "lv_ancestors":"", "lv_full_ancestors":"", "lv_descendants":"", "lv_full_descendants":"", "raid_mismatch_count":"", "raid_sync_action":"", "raid_write_behind":"", "raid_min_recovery_rate":"", "raid_max_recovery_rate":"", "raidintegritymode":"", "raidintegrityblocksize":"-1", "integritymismatches":"", "move_pv":"", "move_pv_uuid":"", "convert_lv":"", "convert_lv_uuid":"", "mirror_log":"", "mirror_log_uuid":"", "data_lv":"", "data_lv_uuid":"", "metadata_lv":"", "metadata_lv_uuid":"", "pool_lv":"", "pool_lv_uuid":"", "lv_tags":"", "lv_profile":"", "lv_lockargs":"", "lv_time":"2022-11-16 23:42:33 +0000", "lv_time_removed":"", "lv_host":"k-prod-worker-1", "lv_modules":"", "lv_historical":"", "lv_kernel_major":"253", "lv_kernel_minor":"0", "lv_kernel_read_ahead":"128.00k", "lv_permissions":"writeable", "lv_suspended":"", "lv_live_table":"live table present", "lv_inactive_table":"", "lv_device_open":"open", "data_percent":"", "snap_percent":"", "metadata_percent":"", "copy_percent":"", "sync_percent":"", "cache_total_blocks":"", "cache_used_blocks":"", "cache_dirty_blocks":"", "cache_read_hits":"", "cache_read_misses":"", "cache_write_hits":"", "cache_write_misses":"", "kernel_cache_settings":"", "kernel_cache_policy":"", "kernel_metadata_format":"", "lv_health_status":"", "kernel_discards":"", "lv_check_needed":"unknown", "lv_merge_failed":"unknown", "lv_snapshot_invalid":"unknown", "vdo_operating_mode":"", "vdo_compression_state":"", "vdo_index_state":"", "vdo_used_size":"", "vdo_saving_percent":"", "writecache_total_blocks":"", "writecache_free_blocks":"", "writecache_writeback_blocks":"", "writecache_error":"", "lv_attr":"-wi-ao----", "vg_name":"openebs-ssd-0", "segtype":"linear"},
                  {"lv_uuid":"JinK3h-m6tT-KdPW-u4q4-hT3d-EQCR-uQNHMC", "lv_name":"pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a", "lv_full_name":"openebs-ssd-0/pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a", "lv_path":"/dev/openebs-ssd-0/pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a", "lv_dm_path":"/dev/mapper/openebs--ssd--0-pvc--ab10ffa5--5175--41ff--8b7e--f5d0104bb04a", "lv_parent":"", "lv_layout":"linear", "lv_role":"public", "lv_initial_image_sync":"", "lv_image_synced":"", "lv_merging":"", "lv_converting":"", "lv_allocation_policy":"inherit", "lv_allocation_locked":"", "lv_fixed_minor":"", "lv_skip_activation":"", "lv_when_full":"", "lv_active":"active", "lv_active_locally":"active locally", "lv_active_remotely":"", "lv_active_exclusively":"active exclusively", "lv_major":"-1", "lv_minor":"-1", "lv_read_ahead":"auto", "lv_size":"1.00g", "lv_metadata_size":"", "seg_count":"1", "origin":"", "origin_uuid":"", "origin_size":"", "lv_ancestors":"", "lv_full_ancestors":"", "lv_descendants":"", "lv_full_descendants":"", "raid_mismatch_count":"", "raid_sync_action":"", "raid_write_behind":"", "raid_min_recovery_rate":"", "raid_max_recovery_rate":"", "raidintegritymode":"", "raidintegrityblocksize":"-1", "integritymismatches":"", "move_pv":"", "move_pv_uuid":"", "convert_lv":"", "convert_lv_uuid":"", "mirror_log":"", "mirror_log_uuid":"", "data_lv":"", "data_lv_uuid":"", "metadata_lv":"", "metadata_lv_uuid":"", "pool_lv":"", "pool_lv_uuid":"", "lv_tags":"", "lv_profile":"", "lv_lockargs":"", "lv_time":"2022-11-10 17:45:54 +0000", "lv_time_removed":"", "lv_host":"k-prod-worker-1", "lv_modules":"", "lv_historical":"", "lv_kernel_major":"253", "lv_kernel_minor":"2", "lv_kernel_read_ahead":"128.00k", "lv_permissions":"writeable", "lv_suspended":"", "lv_live_table":"live table present", "lv_inactive_table":"", "lv_device_open":"open", "data_percent":"", "snap_percent":"", "metadata_percent":"", "copy_percent":"", "sync_percent":"", "cache_total_blocks":"", "cache_used_blocks":"", "cache_dirty_blocks":"", "cache_read_hits":"", "cache_read_misses":"", "cache_write_hits":"", "cache_write_misses":"", "kernel_cache_settings":"", "kernel_cache_policy":"", "kernel_metadata_format":"", "lv_health_status":"", "kernel_discards":"", "lv_check_needed":"unknown", "lv_merge_failed":"unknown", "lv_snapshot_invalid":"unknown", "vdo_operating_mode":"", "vdo_compression_state":"", "vdo_index_state":"", "vdo_used_size":"", "vdo_saving_percent":"", "writecache_total_blocks":"", "writecache_free_blocks":"", "writecache_writeback_blocks":"", "writecache_error":"", "lv_attr":"-wi-ao----", "vg_name":"openebs-ssd-0", "segtype":"linear"}
              ]
          }
      ]
  }

Compare it with lvdisplay

# lvdisplay
  --- Logical volume ---
  LV Path                /dev/openebs-ssd-0/pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a
  LV Name                pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a
  VG Name                openebs-ssd-0
  LV UUID                JinK3h-m6tT-KdPW-u4q4-hT3d-EQCR-uQNHMC
  LV Write Access        read/write
  LV Creation host, time k-prod-worker-1, 2022-11-10 17:45:54 +0000
  LV Status              available
  # open                 1
  LV Size                1.00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281
  LV Name                pvc-72f9517e-749e-4cd1-929f-8b456289d281
  VG Name                openebs-ssd-0
  LV UUID                WD7z1X-4mGk-lH26-yQCH-vOmJ-8qDR-Qdw0EJ
  LV Write Access        read/write
  LV Creation host, time k-prod-worker-1, 2022-11-16 23:42:33 +0000
  LV Status              available
  # open                 1
  LV Size                350.00 GiB
  Current LE             89600
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

A lvlscan only shows one entry for the volume

# lvscan
  ACTIVE            '/dev/openebs-ssd-0/pvc-ab10ffa5-5175-41ff-8b7e-f5d0104bb04a' [1.00 GiB] inherit
  ACTIVE            '/dev/openebs-ssd-0/pvc-72f9517e-749e-4cd1-929f-8b456289d281' [350.00 GiB] inherit

Similar issue was reported for a different project - see https://github.com/storaged-project/libblockdev/issues/667

Searching for ListLVMLogicalVolume revealed, that it is currently only used for the metrics. So I propose to either dedup the results from ListLVMLogicalVolume or skip them within the loop in prometheus.

If agreed upon, I can provide a PR with a fix for this issue

luks commented 1 year ago

I having a same problem: lvm-driver:0.8.3

  LVM version:     2.02.176(2) (2017-11-03)
  Library version: 1.02.145 (2017-11-03)
  Driver version:  4.41.0
jeffguorg commented 1 year ago

any updates? we have exact same problem.

openebs/lvm-driver:1.0.0

Mosibi commented 1 year ago

Same problem with openebs/lvm-driver:1.1.0