netdata / netdata-cloud

The public repository of Netdata Cloud. Contribute with bug reports and feature requests.
GNU General Public License v3.0
41 stars 16 forks source link

[Bug]: war room deleted, notification diabled but still receiving alerts #557

Closed pmiriyev closed 2 years ago

pmiriyev commented 2 years ago

Bug description

Hey guys,

I have disabled mail alerts for some war rooms, but still receiving mail alerts, i went to delete those war rooms in hope to stop receiving mail alerts but still receiving all alerts related to deleted war rooms. it has been already few weeks.

Thanks

Expected behavior

Expected behavior

Steps to reproduce

1. 2. 3. ...

Installation method

kickstart.sh

System info

uname -a; grep -HvE "^#|URL" /etc/*release
Linux minio-Q166 4.18.0-348.20.1.el8_5.x86_64 netdata/netdata#1 SMP Thu Mar 10 11:31:47 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
/etc/almalinux-release:AlmaLinux release 8.5 (Arctic Sphynx)
/etc/centos-release:AlmaLinux release 8.5 (Arctic Sphynx)
/etc/os-release:NAME="AlmaLinux"
/etc/os-release:VERSION="8.5 (Arctic Sphynx)"
/etc/os-release:ID="almalinux"
/etc/os-release:ID_LIKE="rhel centos fedora"
/etc/os-release:VERSION_ID="8.5"
/etc/os-release:PLATFORM_ID="platform:el8"
/etc/os-release:PRETTY_NAME="AlmaLinux 8.5 (Arctic Sphynx)"
/etc/os-release:ANSI_COLOR="0;34"
/etc/os-release:CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
/etc/os-release:
/etc/os-release:ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
/etc/os-release:ALMALINUX_MANTISBT_PROJECT_VERSION="8.5"
/etc/os-release:
/etc/redhat-release:AlmaLinux release 8.5 (Arctic Sphynx)
/etc/system-release:AlmaLinux release 8.5 (Arctic Sphynx)

Netdata build info

netdata -W buildinfo
Version: netdata v1.36.0-61-nightly
Configure options:  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--datadir=/usr/share' '--includedir=/usr/include' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--disable-dependency-tracking' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CXXFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'PKG_CONFIG_PATH=:/usr/lib/pkgconfig:/usr/share/pkgconfig'
Install type: binpkg-rpm
    Binary architecture: x86_64
    Packaging distro:  
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES 
    ACLK:                       YES
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         NO
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    YES
    EBPF:                    YES
    IPMI:                    YES
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES

Additional info

No response

MrZammler commented 2 years ago

Hi @marliyev !

Sorry if this might sound weird, but the alerts you are receiving do come from cloud, right? i.e. from states cloud? Could you please share the output of http://localhost:19999/api/v1/info from the agent you're receiving emails to manolis@netdata.cloud ?

pmiriyev commented 2 years ago

curl http://localhost:19999/api/v1/info { "version": "v1.36.0-61-nightly", "uid": "d0a64405-d1f4-4749-905d-ce190538db9d", "mirrored_hosts": [ "pve-Q1" ], "mirrored_hosts_status": [ { "guid": "d0a64405-d1f4-4749-905d-ce190538db9d", "hostname": "pve-Q1", "reachable": true, "hops": 0, "claim_id": "d0a64405-d1f4-4749-905d-ce190538db9d", "node_id": "1d10d439-8569-47bf-9ba5-95f3ce594228" } ], "alarms": { "normal": 127, "warning": 0, "critical": 0 }, "os_name": "Debian GNU/Linux", "os_id": "debian", "os_id_like": "unknown", "os_version": "11 (bullseye)", "os_version_id": "11", "os_detection": "/etc/os-release", "cores_total": "32", "total_disk_space": "7681511964672", "cpu_freq": "5083000000", "ram_total": "134981734400", "container_os_name": "none", "container_os_id": "none", "container_os_id_like": "none", "container_os_version": "none", "container_os_version_id": "none", "container_os_detection": "none", "is_k8s_node": "false", "kernel_name": "Linux", "kernel_version": "5.15.39-3-pve", "architecture": "x86_64", "virtualization": "none", "virt_detection": "systemd-detect-virt", "container": "unknown", "container_detection": "systemd-detect-virt", "cloud_provider_type": "unknown", "cloud_instance_type": "unknown", "cloud_instance_region": "unknown", "host_labels": { "_cloud_provider_type":"unknown", "_cloud_instance_type":"unknown", "_cloud_instance_region":"unknown", "_os_name":"Debian GNU/Linux", "_os_version":"11 (bullseye)", "_kernel_version":"5.15.39-3-pve", "_system_cores":"32", "_system_cpu_freq":"5083000000", "_system_ram_total":"134981734400", "_system_disk_space":"7681511964672", "_architecture":"x86_64", "_virtualization":"none", "_container":"unknown", "_container_detection":"systemd-detect-virt", "_virt_detection":"systemd-detect-virt", "_is_k8s_node":"false", "_install_type":"binpkg-deb", "_prebuilt_arch":"x86_64", "_prebuilt_dist":"[none]", "_aclk_available":"true", "_mqtt_version":"5", "_aclk_proxy":"none", "_aclk_ng_new_cloud_protocol":"true", "_is_parent":"false" }, "collectors": [ { "plugin": "proc.plugin", "module": "/proc/net/dev" }, { "plugin": "proc.plugin", "module": "/proc/diskstats" }, { "plugin": "proc.plugin", "module": "/proc/net/sockstat" }, { "plugin": "go.d", "module": "web_log" }, { "plugin": "cgroups.plugin", "module": "/sys/fs/cgroup" }, { "plugin": "cgroups.plugin", "module": "systemd" }, { "plugin": "netdata", "module": "ml" }, { "plugin": "netdata", "module": "stats" }, { "plugin": "python.d.plugin", "module": "postfix" }, { "plugin": "timex.plugin", "module": "" }, { "plugin": "python.d.plugin", "module": "fail2ban" }, { "plugin": "proc.plugin", "module": "ipc" }, { "plugin": "proc.plugin", "module": "/proc/net/stat/nf_conntrack" }, { "plugin": "proc.plugin", "module": "/proc/net/softnet_stat" }, { "plugin": "proc.plugin", "module": "/proc/net/snmp6" }, { "plugin": "proc.plugin", "module": "/proc/net/snmp" }, { "plugin": "proc.plugin", "module": "/proc/net/netstat" }, { "plugin": "proc.plugin", "module": "/proc/net/sockstat6" }, { "plugin": "proc.plugin", "module": "/proc/meminfo" }, { "plugin": "proc.plugin", "module": "/proc/vmstat" }, { "plugin": "proc.plugin", "module": "/proc/softirqs" }, { "plugin": "proc.plugin", "module": "/proc/interrupts" }, { "plugin": "apps.plugin", "module": "" }, { "plugin": "proc.plugin", "module": "/proc/pressure" }, { "plugin": "proc.plugin", "module": "/proc/sys/kernel/random/entropy_avail" }, { "plugin": "proc.plugin", "module": "/proc/loadavg" }, { "plugin": "proc.plugin", "module": "/proc/uptime" }, { "plugin": "proc.plugin", "module": "/proc/stat" }, { "plugin": "ebpf.plugin", "module": "oomkill" }, { "plugin": "ebpf.plugin", "module": "process" }, { "plugin": "ebpf.plugin", "module": "vfs" }, { "plugin": "ebpf.plugin", "module": "socket" }, { "plugin": "diskspace.plugin", "module": "" }, { "plugin": "ebpf.plugin", "module": "sync" }, { "plugin": "ebpf.plugin", "module": "filesystem" }, { "plugin": "ebpf.plugin", "module": "filedescriptor" }, { "plugin": "nfacct.plugin", "module": "" }, { "plugin": "go.d", "module": "wireguard" }, { "plugin": "ebpf.plugin", "module": "mount" }, { "plugin": "tc.plugin", "module": "" }, { "plugin": "ebpf.plugin", "module": "hardirq" }, { "plugin": "ebpf.plugin", "module": "softirq" }, { "plugin": "statsd.plugin", "module": "stats" }, { "plugin": "idlejitter.plugin", "module": "" } ], "cloud-enabled": true, "cloud-available": true, "agent-claimed": true, "aclk-available": true, "memory-mode": "dbengine", "multidb-disk-quota": 256, "page-cache-size": 32, "stream-enabled": false, "stream-compression": true, "hosts-available": 1, "https-enabled": true, "buildinfo": "dbengine|Native HTTPS|Netdata Cloud|TLS Host Verification|Machine Learning|Stream Compression|protobuf|JSON-C|libcrypto|libm|zlib|apps|cgroup Network Tracking|CUPS|EBPF|IPMI|NFACCT|perf|slabinfo|Prometheus Remote Write", "release-channel": "nightly", "web-enabled": true, "notification-methods": "SEND_EMAIL", "exporting-enabled": false, "exporting-connectors": "", "allmetrics-prometheus-used": 0, "allmetrics-shell-used": 0, "allmetrics-json-used": 0, "dashboard-used": 0, "charts-count": 923, "metrics-count": 5809, "ml-info": { "charts-to-skip": "anomaly_detection. netdata.", "diff-n": 1, "dimension-anomaly-score-threshold": 0.99, "dimension-rate-threshold": 0.05, "enabled": true, "host-anomaly-rate-threshold": 0.01, "hosts-to-skip": "!*", "idle-window-size": 30.0, "lag-n": 5, "max-kmeans-iters": 1000, "max-train-samples": 14400, "max-window-size": 600.0, "min-train-samples": 900, "min-window-size": 30.0, "random-sampling-ratio": 0.2, "smooth-n": 3, "train-every": 3600, "version": 1, "window-rate-threshold": 0.25

pmiriyev commented 2 years ago

ahm, and this deleted nodes, war rooms, all still remain on cloud dashboard. weird.

hugovalente-pm commented 2 years ago

ahm, and this deleted nodes, war rooms, all still remain on cloud dashboard. weird.

@marliyev this is indeed strange, could you share with us your space ID? if you prefer you can e-mail it to me at hugo@netdata.cloud

to go a bit back to your original description of the issue

I have disabled mail alerts for some war rooms, but still receiving mail alerts, i went to delete those war rooms in hope to stop receiving mail alerts but still receiving all alerts related to deleted war rooms. it has been already few weeks.

Here you mentioned you had disabled the alerts for some War Rooms, your goal was to disable the alerts for some specific nodes? I'm asking because the nodes that were on those War Room would still be in the All Nodes room and would be triggering alerts from there, so trying to confirm what was the initial task you wanted to complete

pmiriyev commented 2 years ago

ahm, and this deleted nodes, war rooms, all still remain on cloud dashboard. weird.

@marliyev this is indeed strange, could you share with us your space ID? if you prefer you can e-mail it to me at hugo@netdata.cloud

to go a bit back to your original description of the issue

I have disabled mail alerts for some war rooms, but still receiving mail alerts, i went to delete those war rooms in hope to stop receiving mail alerts but still receiving all alerts related to deleted war rooms. it has been already few weeks.

Here you mentioned you had disabled the alerts for some War Rooms, your goal was to disable the alerts for some specific nodes? I'm asking because the nodes that were on those War Room would still be in the All Nodes room and would be triggering alerts from there, so trying to confirm what was the initial task you wanted to complete

nope, i have disabled alerts for a whole war room from cloud dash, plus deleted those war rooms, but still all remain, i will email to you space id

hugovalente-pm commented 2 years ago

thanks for the details @marliyev, haven't been able to spot anything a miss but will ask the team for additional help.

some further questions to try to help us troubleshoot this:

feel free to share these details here or through e-mail

parviz-dev commented 2 years ago

lets say spark, kafka rooms, i have disabled alerts from this war rooms and additionally delete this war rooms, but still receiving different of types alerts, like, unreachable, like disk or traffic alerts

hugovalente-pm commented 2 years ago

@marliyev @ptkiki from what we could see, at the moment, you have the notifications active for all the rooms in your space.

please also be aware that even if you delete a room and/or remove a node from a specific room (not deleting an Offline node) it will still belong on the All Nodes and if you have the notifications enabled there you may get notified for the node you had removed from other room.

could you provide some concrete examples for us to check into issues with nodes not being deleted and rooms not being deleted? sharing the IDs would be the best, but if you can provide the actual names it can help

parviz-dev commented 2 years ago

Yep, i have deleted war rooms but nodes still remains on All nodes, coz of those war rooms deleted i cant control alerts for those. So to delete those nodes from All nodes i need to unclaim nodes and delete them as offline nodes?

hugovalente-pm commented 2 years ago

correct @ptkiki, if you just switch the agent off these will be shown as Offline in Cloud and you can already delete them but if for some reason the agent is restarted they will appear again in Cloud (claiming information is still existing on the agent under /netdata/cloud.d).

if you really don't want to see them again in Cloud the best is to unclaim them

one other option, if you have your infrastructure segmented in different rooms, other than the All Nodes, you can use those those to control the notifications you want to receive and disable the notifications from the All Nodes room

image

let us know in case things don't work for you

parviz-dev commented 2 years ago

yup, i have disabled notifications exactly same way a few weeks ago but notifications till now coming, even if after i have deleted war rooms. so am going to disable agent on servers, i see there is no other way to stop notifications.

parviz-dev commented 2 years ago

so only after disabling netdata agent i was able to stop notifications, plus was able to delete those nodes from cloud dash

hugovalente-pm commented 2 years ago

thanks for the update @ptkiki and glad that finally you were able to stop the notifications.

it is strange that the change of the settings on the rooms didn't work, you shouldn't need to resort to removing the nodes, so we will keep trying to investigate/replicate it on our end - suggest to keep the ticket open for a bit longer until we have some more finding on our side

to confirm about this that you had also shared:

ahm, and this deleted nodes, war rooms, all still remain on cloud dashboard. weird.

this isn't happening any longer? you can delete nodes and war rooms without issues?

pmiriyev commented 2 years ago

Yup, offline nodes i can delete but disabling notifications on cloud dash settings, doesnt work

hugovalente-pm commented 2 years ago

@marliyev we further tried to replicate this and had a Node that was on the All Nodes and a Notifications test room triggering alerts.

we tested the following scenarios for the Notifications settings under User profile:

based on this, I suggest closing this bug for now and if you face this issue again it could be re-opened.

hugovalente-pm commented 2 years ago

@marliyev as mentioned in previous comment we will be closing this. if this happens to you again let us know