Closed kellervater closed 2 years ago
How are you installing/updating netclient. If you are using your distro package manager, the latest packages should be enabling the netclient service so that it will start after a reboot.
I have the same issue using a RPi and an Ubuntu 18.04 VM, using a clean install of netclient v0.14.5 (latest version at the moment)
I did a fresh install on the weekend, but still the same. First uninstall netclient on all nodes and then reinstalled it with ansible playbooks:
# ansible tasks
- hosts: nodes:rancher
tasks:
- name: uninstall netclient
shell: |
netclient uninstall
become: yes
register: _result
failed_when: "_result.rc != 0 and 'netclient: not found' not in _result.stderr"
changed_when: "'uninstalled netclient' in _result.stderr"
- name: uninstall apt dependency
ansible.builtin.apt:
pkg: netclient
state: absent
become: yes
The above tasks translate to:
sudo netclient uninstall
sudo apt remove netclient
Then the fresh installation via ansible (excerpt):
# netmaker installation comes first
...
- hosts: nodes:rancher
any_errors_fatal: true
pre_tasks:
- name: Netclient prerequisites
shell: |
curl -sL 'https://apt.netmaker.org/gpg.key' | tee /etc/apt/trusted.gpg.d/netclient.asc
curl -sL 'https://apt.netmaker.org/debian.deb.txt' | tee /etc/apt/sources.list.d/netclient.list
become: yes
- name: Install packages
apt:
pkg:
- netclient={{ netclient_version }} # netclient_version: 0.14.5-2
become: yes
tasks:
- name: Join Network
shell: |
netclient join -t {{ hostvars['netmaker']['network_access_token'] }} {% if netmaker_ip|length > 0 %}--address {{ netmaker_ip }}{% endif %}
register: join_result
changed_when: "'ALREADY_INSTALLED' not in join_result.stdout"
become: yes
...
# then make all nodes static via API
...
- name: Pull latest config
shell: |
netclient pull -n {{ hostvars['netmaker']['network']['id'] }}
become: yes
- name: ping all peers (including self)
shell: |
ping "{{ item }}" -c 1
register: result
retries: 5
delay: 5
until: result.rc == 0
loop: "{{ nodes.json|map(attribute='address') }}"
After this everything works fine until a reboot is executed.
After a reboot I'd assume my peers are pingable. Even after 10 minutes nothing's happening. But with a netclient pull
it's working instantly.
Since the wirguard network builds the base of my k8s cluster, nodes cannot recover automatically anymore. Right now though, it's more of an annoyance than an issue.
I'd like to add that I'm also having this problem on a fresh install of 0.14.6. My initial install (0.12.2) did not have this problem. I kept upgrading with each release and sometime around the time that the netclient repository became available, this problem began. I waited after i updated to 0.14.3 for a few releases and attempted a fresh install of NetMaker on the VPS assuming that the problem may have had to do with upgrading from an old config file, but the problem is still there.
I have NetMaker installed on a Hetzner VPS running Ubuntu 22.04. I have netclient installed on Raspberry Pi OS (debian bullseye) and Ubuntu Desktop 22.04, and the problem exists on both boxes. Further, on the Ubuntu box, after I run 'netclient pull' my outbound connection to the internet gets corrupted and I have to manually disable and enable the wired connection to get everything working again... Not very convenient when I'm away from the boxes.
Did you ever resolve this issue?
Haven't tested on v0.15.0
so far. But on mentioned version above I just created a startup script which does a netclient pull
.
@ppoetz Don't suppose you have a copy of that script do you?
@martinkeat
Service script: [Unit] Description=Run a netclient pull
[Service] Type=forking User=root Group=root UMask=1000 ExecStart=/usr/sbin/netclient pull Type=oneshot RemainAfterExit=yes
[Install] WantedBy=multi-user.target
Then run:
sudo systemctl daemon-reload sudo systemctl enable netclientpull.service
@ppoetz is this still an issue?
I was able to resolve it by manually editing the .service file after raising the issue on discord (this was before the solution was posted above). I haven't had any problems with updates since then, but I can't say whether it would've resolved itself in an update if I hadn't intervened.
@mattkasun will upgrade my cluster on Saturday to the latest version and then give you an update on this. If not, I'll try the workaround @ghgeiger mentioned.
So... I performed an upgrade from v0.14.5 to v0.16.0 on Netmaker as well as an update from v0.14.5-2 to v0.16.0-2 on all netclients (apt) in our networks.
And my issue is resolved! I can now reboot any instance and instantly ping other nodes! Thank you very much!
Contact Details
patrick.poetz@voo.aero
What happened?
Since last upgrade
v0.14.0
->v0.14.5
my nodes need to pull config again to rejoin the network. I run 3 bare metal nodes which run netclient installed viaapt
to form a k8s cluster later on. The nodes don't rejoin the network after a restart or crash which would need an extra startup script to executenetclient pull
or manual intervention.Version
v0.14.5
What OS are you using?
Linux
Relevant log output
Contributing guidelines