Closed bbigras closed 1 year ago
So I'm seeing shutdown logs in there too, not sure if that was you or from an error. Some of those ping failures you witness are usually benign. When a new connection is established boths sides will do a short ping to initiate the connection. It doesn't always work - but the heartbeats usually end up starting it anyway.
Those multiple failed heartbeats are a little more concerning - unless you had shut down the other side of the connection.
Gotcha.
Here's an asciinema video (top panel is my desktop (server), the 2 other panels are my laptop (client)) https://asciinema.org/a/uuKQNDtPIdaguQ1WSAdTMmse3
Note that I'm already using both tailscale and wireguard on both computers.
Can take a look in a little bit and see whats going on. One other thing to check - are there overlapping address assignments between Tailscale, your devices, and webmesh? Webmesh side you can configure the internal IPv4 prefix at bootstrap - the IPv6 one is randomly generated.
One way to rule that out would be to try running it with ipv4 disabled.
Here's another recording with --global.no-ipv4
and --no-ipv4
.
https://asciinema.org/a/Q26LZhOZCCHVIxUb0CgJoWvjT
Note that I'm using webmesh with tailscale (100.85.215.110 is a tailscale ip).
Is there something like --global.primary-endpoint
but for wmctl connect
?
It works if I connect using lan ips instead of tailscale.
EDIT: but it doesn't if I try to connect to my vps from my desktop. Only my vps's ports are open.
To the wmctl connect
question. There isn't. That utility was originally just for testing, but decided to keep it around. It acts like a NATd VPN client by default. You just poke out.
At least one node needs to accessible currently for peerings to work. But I am currently working on other methods of discovery.
At least one node needs to accessible currently for peerings to work. But I am currently working on other methods of discovery.
In my tests, at least one node was accessible.
I'll do more test, though.
I hope to be able to look more closely tonight. One more thing that you can try (and is a really easy way to shoot yourself in the foot that I need to document better) - is that given this is a sort of "zero-trust" solution, the default behavior only lets people peer up if there is a Network ACL allowing it. Those are managed via the admin API, but you can set a default allow-all rule at bootstrap with --bootstrap.default-network-policy=accept
. You'll see something similar in most of the examples.
Here's a nixpkgs test. It works only with the --global.primary-endpoint 192.168.2.101
line. If I remove it I get https://asciinema.org/a/H4bHoph8bTN0NlUd0FmR73OfG . Note that it's possible that the vms have more than 1 network interface each, since setting an ip for eth1 might have created a new one if the default one if not eth1.
I'll test again with my vps.
Don't hesitate if you want to know how to run this test with nixpkgs.
import ./make-test-python.nix ({ pkgs, ... }: {
name = "webmesh";
meta.maintainers = with pkgs.lib.maintainers; [ bbigras ];
nodes = {
server = {
networking = {
interfaces.eth1 = {
ipv4.addresses = [
{ address = "192.168.2.101"; prefixLength = 24; }
];
};
firewall = {
trustedInterfaces = [ "webmesh0" ];
allowedTCPPorts = [
8443
9443
];
allowedUDPPorts = [
51820
];
};
};
systemd.services.webmesh = {
after = ["network-online.target"];
wantedBy = [ "multi-user.target" ];
script = ''
${pkgs.webmesh}/bin/webmesh-node \
--global.insecure \
--global.no-ipv6 \
--global.detect-endpoints \
--global.detect-private-endpoints \
--bootstrap.enabled \
--bootstrap.default-network-policy=accept \
--global.primary-endpoint 192.168.2.101
'';
};
};
client = {
networking = {
interfaces.eth1 = {
ipv4.addresses = [
{ address = "192.168.2.102"; prefixLength = 24; }
];
};
firewall = {
trustedInterfaces = [ "webmesh0" ];
};
};
systemd.services.wmctl = {
after = ["network-online.target"];
wantedBy = [ "multi-user.target" ];
script = ''
${pkgs.webmesh}/bin/wmctl \
connect --insecure --no-ipv6 --join-server=192.168.2.101:8443
'';
};
};
};
testScript =
''
server.start()
server.wait_for_unit("webmesh.service")
server.wait_for_open_port(8443)
client.start()
client.wait_for_unit("wmctl.service")
client.wait_for_open_port(9443)
client.succeed("ping -c1 172.16.0.1")
client.sleep(120)
client.succeed("ping -c1 172.16.0.1")
'';
})
I'm gonna try to replicate it locally - and if I fail may reach out for more infoz
Also note that I don't have to use --global.primary-endpoint 192.168.2.101
if I disable eth0 in my test.
I think I've found an issue - not sure if it is related to yours. But in a similar setup the second client keeps dropping the peer. I'll let you know what I figure out.
Just to give you a quick update. I'm trying to hammer out one last bug and then I'll tag a new release.
I think most related to your issue was in the storage update trigger on each node. It was doing some hacky logic to make sure it was fully up to date - and a recent fix had made that no longer necessary. I think that old hack was causing peer refreshes not to happen at the right times.
I'm having another issue where peer refreshes are sometimes returning the wrong internal IPs to be set to the allowed IPs. I'm not sure if this is something you are experiencing - but I hope to figure it out before I push this other fix out.
I'm still working on the second issue - but if you are able to build from main
you can see if the first one was your problem. If you can't build yourself, the CI will have an image in a little bit.
I still need --global.primary-endpoint
with 56a9f6b671e82d862e34187e6956a1de2af16371 . I mean my nix test, not my real test with my vps.
That is likely unrelated. Detection is non-deterministic. So if you want a specific one to be tagged as the primary - you have to specify. That being said - I can try to look into it more.
Worth noting there are also the --mesh.*-endpoint
options for more granular control.
I'm testing with my desktop to my vps and it seems to stay connected now. :tada:
I don't mind using --global.primary-endpoint
:)
Sweet - I fixed my other bug so about to tag a new release. Will close this, but feel free to open a new issue if anything arises.
I'm following the https://webmeshproj.github.io/guides/personal-vpn/ guide. I'm able to connect and some ping works, but I get this on the server side:
I think this cause disconnections.
0.1.2