gravitl / netclient

Apache License 2.0
66 stars 31 forks source link

[Bug]: Netmaker zombie mechanism randomly removes nodes #285

Closed taladar closed 1 year ago

taladar commented 1 year ago

Contact Details

No response

What happened?

We have been using netmaker for a few weeks now with about 25 users and a couple of servers (overall about 40 nodes).

So far one node has disappeared twice and another once from the netmaker node list.

The access logs show that nobody called a delete route on those nodes and there is also no mention of their removal in the netmaker.service logs in journald.

Version

v0.17.1

What OS are you using?

Linux

Relevant log output

The UUID is the UUID of one of the nodes that disappeared according to older logs.

Mar 12 12:15:54 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-12 12:15:54 adding  22f907d2-fce1-42b4-bf2e-d8443e7da0de  to zombie list
Mar 12 12:15:54 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-12 12:15:54 adding 22f907d2-fce1-42b4-bf2e-d8443e7da0de to zombie quaratine list
Mar 13 09:37:37 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-13 09:37:37  failed to get node info [22f907d2-fce1-42b4-bf2e-d8443e7da0de]: no result found
Mar 13 09:37:37 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-13 09:37:37 processed request error: no result found
Mar 13 09:36:03 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-13 09:36:03 mq-ping error getting node:  no result found
Mar 13 09:36:03 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-13 09:36:03 error reading database  no result found


### Contributing guidelines

- [X] Yes, I did.
taladar commented 1 year ago
Mar 12 12:16:54 hostname-removed netmaker_0.17.0[3969]: [netmaker_0.17.0] 2023-03-12 12:16:54 deleting zombie node nodename-of-missing-node
taladar commented 1 year ago

This is likely related to the fact that those nodes get ac:de:48:00:11:22 as the MAC Address which Apple seems to use for an iBridge device on all its MBP models since 2016.

mattkasun commented 1 year ago

as a workaround, you can specify a macaddress when joining with -m --macaddress flag e.g. ./netclient join -t <token> -m <unique-macaddress> or ./netclient join --user <user> --server <api connection string> --network <network> --macaddress <unique macaddress>

taladar commented 1 year ago

Yes, I should have added that.

Eventually it would probably be good if netclient just got the next MAC address if the first interface it tries has this fixed one though.