Closed andrewm4894 closed 1 month ago
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/cant-delete-stale-nodes/3909/2
I found you can delete them if you delete the parent and re-install fresh on that parent machine. You have to remove the parent and all vnodes from the cloud dashboard and then when you re-claim the parent host it will set things up fresh.
I tried to erase my historical data directly to see if that would clear it up, as a workaround until netdata makes an official way to do this. I opened up the list of stale nodes:
and mouse-over'd the stale node to delete and copied a link like https://EXAMPLE.ORG/v2/spaces/DOMAINTLD/rooms/local/nodes/888586af-e5ab-47f2-8094-c4948fd1243a.
Then I extracted the UUID and deleted the folder that holds its data on my parent node:
systemctl stop netdata
cd /var/lib/netdata
rm -r 888586af-e5ab-47f2-8094-c4948fd1243a ... # deleting each of the folders
systemctl start netdata
On rebooting, the charts are gone, but the node itself is still listed as "stale"
So that wasn't enough.
I poked around some more and found this sqlite database:
root@monitor:~# sqlite3 /var/cache/netdata/netdata-meta.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> .tables
alert_hash dimension host metadata_migration
chart health_log host_info node_instance
chart_label health_log_detail host_label
sqlite> select * from host where hostname='host1.example.org';
host_id|hostname|registry_hostname|update_every|os|timezone|tags|hops|memory_mode|abbrev_timezone|utc_offset|program_name|program_version|entries|health_enabled
�9�ƃ!�����wK|host1.example.org|host1.example.org|15|linux|America/Toronto||1|5|EST|-18000|netdata|v1.33.1|0|1
��ER����
�z���|host1.example.org|host1.example.org|15|linux|Etc/UTC||1|5|EST|-18000|netdata|v1.42.1|0|1
annoyingly, host_id, presumably the UUID, is stored in binary, while the rest is stored as text, but I was able to remove the entry with:
sqlite> delete from host where hostname='host1.example.org' and program_version='v1.33.1';
After another
root@monitor:~# systemctl restart netdata
the stale node is now gone from my dashboard. :tada:
Unfortunately this is not very clean. I believe there are still entries in the host_label
and host_info
and and node_instance
tables referencing the deleted host_id
, but I don't know how to input binary data in the sqlite CLI and I don't feel like digging out python right now to do it, so the garbage is just going to sit around.
I had an installation problem with a node and now it's marked as "Stale" and "delete is disabled". The node is dead and will never be coming back. How do I get rid of this thing? Is there really no way to delete this??
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/impossible-to-delete-stale-node/5537/1
@netdata-community-bot funny. that's MY post.
+1 bumping this feature request. I'd hate to have to hack around in a database to be able to get rid of stale machines that landed there by accident, and waiting for the data to expire seems like an inelegant alternative.
edit: It turns out there is a way, but it's not GUI-friendly. Got this from https://community.netdata.cloud/t/impossible-to-delete-stale-node/5537/3
- From app.netdata.cloud, navigate to your Node list
- Next to the name of the Stale node, click on the little (i) symbol (View node information)
- At the very bottom of the panel that opens to the right, you will see a "View node info in "json" button - click it. You should see a message that says “JSON copied to clipboard”
- Paste that into a text editor.
- Grab the value of the id: {...} key. This should be a string in UUID format, e.g. 6e072590-a422-45b2-bdab-cdd3fb14ad68
- Connect to your parent node via SSH
- Execute the following command:
netdatacli remove-stale-node {uuid}
substituting {uuid} above with your real one
@darxtorm Until they make this easier, here are steps I took recently to remove a stale node, which were kindly provided by @ilyam8. Worked for me.
(i)
symbol (View node information)id: {...}
key. This should be a string in UUID format, e.g. 6e072590-a422-45b2-bdab-cdd3fb14ad68
netdatacli remove-stale-node {uuid}
substituting {uuid}
above with your real one, obviously…
netdatacli aclk-state
netdatacli remove-stale-node {uuid}
@darxtorm Until they make this easier, here are steps I took recently to remove a stale node, which were kindly provided by @ilyam8. Worked for me.
Absolutely, it's mildly clunky to say the least. Wanted to add that for cloud at least, after I had performed the above, I also had to go to Manage Space -> Nodes and perform a delete in there (the node was now showing as Offline rather than Stale, and the delete button was no longer disabled) to truly get rid of the ghost!
I used netdatacli remove-stale-node
on a bunch of stale nodes but it didn't have any effect — other than changing the Node ID
in the netdatacli aclk-state
from a UUID to null
.
Is there something else I'm missing? Each time I'd run the command, it would say something like:
Unregistering node with machine guid 83fb052f-49ee-11ab-b00f-3e2f6b85cde4, hostname = dc413990ab4a
(We had a bunch of test containers spin up and they all "registered" with our (on-prem) Netdata instance and now I can't figure out how to remove them...)
@eddyg Restarting the parent node should make disappear from the UI.
@stelfrag see https://github.com/netdata/netdata-cloud/issues/690#issuecomment-2259543333, is it expected that a restart is required?
@sashwathn hey, I think we need to allow removing stale nodes from the UI. It will simplify users life tremendously.
@eddyg Restarting the parent node should make disappear from the UI.
Thanks for following up on this, Ilya!
Removing stale nodes:
ALL_NODES
keyword)
Problem
As a user i can only delete "offline" nodes from NC. I should be able to delete any nodes i want.
We need to split the problem into cases with node status as a key:
Example: I had a group of 11 nodes streaming to my parent. I deleted these VM's since i no longer need them. However i still see them in Netdata Cloud and am unable to delete them from NC.
Should i not be able to delete them? Unsure if this is a bug or feature request.
These nodes are gone an never coming back so i would like to remove them from NC. I guess maybe eventually the data for them might fall away on my parent and maybe then they would be offline in NC maybe and then i could delete perhaps. Unsure.
https://netdata-cloud.slack.com/archives/CS3PB0VJ7/p1671026396555759
Description
Importance
must have
Value proposition
Proposed implementation
No response