Closed jkralik closed 1 month ago
The issue is here. Every time we get a new inbound stream we block until the handshake is complete. This behavior also isn't a regression f7a68a36523bf733b1c4139f88e109b4b31e02f8
We should look at how crypto/tls
buffers internally so that handshakes don't block either. I don't know what the API should look like. I don't know if this is an established Go pattern, but you could call accept
n times. That is one way you could accomplish the concurrency. Let me confirm first.
@backkem any ideas?
https://gist.github.com/Sean-Der/a3a6abbe9b78ced613590312a18dc8a4
@jkralik @boaks @jdbruijn this worked^ If one Server process hangs the other 4 work just fine. If I remove the change in the example one client hanging hurts everything.
You just need to decide how many goroutines you are willing to spawn.. Maybe this should be something that Pion automatically does? I am not sure what is idiomatic.
I've checked in my test setup and also works there. See the readme of the repo I've just referenced this in. Will be testing this in my application code, but should work for the time being at lest. @Sean-Der Thanks so much for the quick analysis and fix.
We should look at how
crypto/tls
buffers internally so that handshakes don't block either.
I have no idea what is the idiomatic but I'll have a look at crypto/tls
to see what they do. I can't imagine that handshakes would be blocking there.
You just need to decide how many goroutines you are willing to spawn..
I'll have a look at how the go routines work exactly and what spawning N of them means in CPU, RAM etc. to determine how much I'd be willing to spawn.
For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.
I'm probably misunderstanding the problem, but can't we use a buffered channel to get around this?
var conns := make(chan net.Conn, 100)
hub := util.NewHub()
go func(l net.Listener) {
for {
conn, err := l.Accept()
if err != nil {
.. do something ..
return
}
conns <- conn
}
}(listener)
for {
select {
case c := <-conns:
hub.Register(conn)
}
}
hub.Chat()
Oh mm, the Accept is on dtls.Listener, so that triggers the whole handshake machinery. We should be able to Accept() on the underlying listener first and then spin the rest off into a goroutine? I suppose the problem is this: https://github.com/pion/dtls/blob/c6dbd482fa3a0a7655e9dc338a0f15be905cdf35/listener.go#L65? We do this inline, instead of spawning a goroutine there that'll then take care of the handshake?
@Sean-Der From my point view, spawning goroutine's below to underlayer of dtls/listener, because we want to have similar API as tls.Listener. and it is more convenient for using. Of course there must be option to limit spawned goroutines.
@daenney yep exactly!
@jkralik agree!
Server in crypto/tls
doesn't actually do anything so we need to match this API. Instead the first call to Read
or Write
should start the handshake.
It looks like we will need to do a v3
and add Handshake
I can't believe I missed this before :( this is a pretty big breaking change, but I don't think it will be difficult to do!
For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.
I'm not sure, when exactly the spawned goroutine is required. If this is already required for RFC 6347 - 4.2.1. Denial-of-Service Countermeasures, I think it violates the "In order to counter both of these attacks, DTLS borrows the stateless cookie technique used by Photuris [PHOTURIS] and IKE [IKEv2]." there. Then each (spoofed) CLIENT_HELLO may trigger/consume such a goroutine. Even if that number is limited, it will make it easy for spoofer to perform a DoS attack.
I'm not sure, when exactly the spawned goroutine is required.
@boaks Each Accept
call would be in one. You would do a pre-allocated pool, so I don't think the initial ClientHello
will cause a problem!
The API difference is that crypto/tls
does the handshake on the first call to Read
or Write
. pion/dtls
does it on the call to Accept
. All the blocking occurs in Read
and Write
which users usually move to a goroutine.
If users do a pool and call Accept
multiple times they can handle multiple handshakes at once. That is the best option to unblock right away.
Going forward we have two choices to make the API behave exactly like crypto/tls
Handshake
in the first call to Read
/Write
. This adds more complexity, but is not a breaking change!Handshake
when creating the connection. Only when doing the first Read
/Write
or when calling Handshake
explicitly. One thing that worries me is that this could encourage users to spawn a goroutine each time a ClientHello
is received. The user will do the call to Read
/Write
/Handshake
in a new goroutine.
If we were to take option 1, realistically we'd want that option on/by default since the current behaviour is undesirable. But we can't flip that on by default since it changes our behaviour quite a bit and we don't know if folks are relying on it somehow. That would result in us having to do a major anyway to be able to flip the default for that option, at which point I'd rather we break the API and align on crypto/tls
than take on the tech debt.
Is there any progress on this? We are accepting DTLS connections of potentially many thousand clients (IoT devices) on low bandwidth connections with high timeouts - which is worst case for this scenario ....
My experience with (too) many coap and/or dtls stacks is, it's either intended and developed to be used for IoT, or you may be faced much more pain, then you expect. That doesn't mean, other implementations are bad, it just means, they are not really supporting IoT.
Hi @niondir
Instead of doing your Handshake
or first Read
/Write
in a goroutine create your dtls/Conn
in one. Even when we change the behavior to match crypto/TLS
you still have to handle the concurrent handshakes.
Happy to create a JSFiddle showing how to do this. When this issue is closed though I don’t think this library will change dramatically, we are just moving when something blocks.
thanks!
FWIW I'm doing like example below in our DTLS server for IoT devices to create 1000 handshake workers/listeners, also using go-coap. Has been working great for over a year now.
func (l *DTLSListener) acceptLoop() {
defer l.wg.Done()
for i := 0; i < 1000; i++ {
go func() {
for {
conn, err := l.listener.Accept()
select {
case l.connCh <- connData{conn: conn, err: err}:
case <-l.doneCh:
return
}
}
}()
}
}
This is a pretty big issue. From my experience, 1 bad handshaker and a wonky config can pretty much take down the entire server (default 30s handshake timeout, client may have a much shorter one like 5s. Client handshakes build up infinitely.)
I will submit a PR, but I propose:
type Handshaker interface {
Handshake() (net.Conn, error)
}
type handshaker struct {
inner net.Conn
conf *Config
}
func(h handshaker) Handshake() (net.Conn, error) {
return Server(h.inner, h.conf)
}
func(ln *listener) AcceptHandshaker() (Handshaker, error) {
inner, err := ln.parent.Accept()
return handshaker{inner, ln.config}, err
}
The Handshake
method can then be called in a background goroutine.
Your environment.
https://github.com/go-ocf/go-coap/issues/109
What did you do?
Just look at last comment. But it seems that other handshakes waits each other.
What did you expect?
dlts.Listener is able to process parallel handshakes.