yggdrasil-network / yggdrasil-go

An experiment in scalable routing as an encrypted IPv6 overlay network
https://yggdrasil-network.github.io
Other
3.52k stars 242 forks source link

Frequent "bind: address already in use" messages. #1173

Open jgoerzen opened 1 month ago

jgoerzen commented 1 month ago

I've got an issue with Yggdrasil 0.5.8. This doesn't start right away, but I am getting logspam every second with:

Not multicasting on wlp170s0 due to error: listen tcp [fe80::redacted%wlp170s0]:58183: bind: address already in use

Now nothing except Yggdrasil has that port open, and it is only used for the MulticastInterfaces section in yggdrasil.conf. netstat -anp does show that Yggdrasil is listening on that port, despite the logspam. Not sure what causes it; perhaps the interface dropping and reappearing after moving the laptop to a different wifi network or something?

jgoerzen commented 1 month ago

The multicast section of my config looks like this:

  MulticastInterfaces:
  [
    {
      Regex: "^wlp|^enx"
      Beacon: true
      Listen: true
      Port: 38153
      Priority: 0
      Password: "redacted"
   },

  ]

38153 doesn't occur anywhere else in the file.

neilalexander commented 3 weeks ago

Can you please try the latest develop commits to see if this is improved?

jgoerzen commented 3 weeks ago

Yes, I will get to that either today or tomorrow. Thanks!

jgoerzen commented 3 weeks ago

Initial experience suggests it's fixed. I'll keep an eye on it for another day or so yet.

jgoerzen commented 3 weeks ago

I should add, I am seeing some issues with the newer yggdrasil maintaining connections. It's a bit unstable (talking to my other 0.5.8 systems). That may be completely unrelated, just thought I'd mention.

neilalexander commented 3 weeks ago

What kind of problems? As in the peerings drop?

jgoerzen commented 3 weeks ago

Right. It produces excessive packet loss (~50% according to ping). Going back to the stable version resolved the issue instantly.

It seems to be intermittent; I can't always reproduce it easily.

On Fri, Oct 04 2024, Neil wrote:

What kind of problems? As in the peerings drop?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

neilalexander commented 3 weeks ago

Any log entries?

The routing algorithm has changed a little to take link cost (which is based on a smoothed average of the RTT) into account in some cases, so it might be that the path taken by the new code is different to the current release...

jgoerzen commented 3 weeks ago

Just a lot of "Connected (in|out)bound" and "Disconnected (in|out)bound".

Looking at the timestamps, maybe it just needed another minute or two to settle after resuming from sleep than 0.5.8 did. That might be possible.

Now that I look at it, when on 0.5.8 (both before and after this test, but not during!), I had periodic restarts due to:

Oct 04 11:11:42 sgo2 yggdrasil[1638]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x7f2793]
Oct 04 11:11:42 sgo2 yggdrasil[1638]: goroutine 8684 [running]:
Oct 04 11:11:42 sgo2 yggdrasil[1638]: github.com/yggdrasil-network/yggdrasil-go/src/core.(*Core)._close.(*links).shutdown.func1()
Oct 04 11:11:42 sgo2 yggdrasil[1638]:         github.com/yggdrasil-network/yggdrasil-go/src/core/link.go:111 +0xd3
Oct 04 11:11:42 sgo2 yggdrasil[1638]: github.com/Arceliar/phony.(*Inbox).run(0xc000222548)
Oct 04 11:11:42 sgo2 yggdrasil[1638]:         github.com/Arceliar/phony/actor.go:98 +0x2d
Oct 04 11:11:42 sgo2 yggdrasil[1638]: created by github.com/Arceliar/phony.(*Inbox).restart in goroutine 8683
Oct 04 11:11:42 sgo2 yggdrasil[1638]:         github.com/Arceliar/phony/actor.go:132 +0x4f
neilalexander commented 3 weeks ago

Did the Connected/Disconnected lines have a reason at the end? If they didn't then they are typically just regular io.EOF errors which suggests the other side just hung up/stopped/crashed etc, so I'm wondering if there's something else going on here.

jgoerzen commented 3 weeks ago

Nope, no reason. There's no particular reason that it should think the other end, even the LAN peers, would have been unstable. They were all up and running like usual.