pion / dtls

DTLS 1.2 Server/Client implementation for Go
https://pion.ly/
MIT License
584 stars 154 forks source link

Listener doesn't process parallel handshakes #279

Closed jkralik closed 1 month ago

jkralik commented 4 years ago

Your environment.

https://github.com/go-ocf/go-coap/issues/109

What did you do?

Just look at last comment. But it seems that other handshakes waits each other.

What did you expect?

dlts.Listener is able to process parallel handshakes.

Sean-Der commented 4 years ago

The issue is here. Every time we get a new inbound stream we block until the handshake is complete. This behavior also isn't a regression f7a68a36523bf733b1c4139f88e109b4b31e02f8

We should look at how crypto/tls buffers internally so that handshakes don't block either. I don't know what the API should look like. I don't know if this is an established Go pattern, but you could call accept n times. That is one way you could accomplish the concurrency. Let me confirm first.

@backkem any ideas?

Sean-Der commented 4 years ago

https://gist.github.com/Sean-Der/a3a6abbe9b78ced613590312a18dc8a4

@jkralik @boaks @jdbruijn this worked^ If one Server process hangs the other 4 work just fine. If I remove the change in the example one client hanging hurts everything.

You just need to decide how many goroutines you are willing to spawn.. Maybe this should be something that Pion automatically does? I am not sure what is idiomatic.

jdbruijn commented 4 years ago

I've checked in my test setup and also works there. See the readme of the repo I've just referenced this in. Will be testing this in my application code, but should work for the time being at lest. @Sean-Der Thanks so much for the quick analysis and fix.

We should look at how crypto/tls buffers internally so that handshakes don't block either.

I have no idea what is the idiomatic but I'll have a look at crypto/tls to see what they do. I can't imagine that handshakes would be blocking there.

You just need to decide how many goroutines you are willing to spawn..

I'll have a look at how the go routines work exactly and what spawning N of them means in CPU, RAM etc. to determine how much I'd be willing to spawn.

For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.

daenney commented 4 years ago

I'm probably misunderstanding the problem, but can't we use a buffered channel to get around this?


var conns := make(chan net.Conn, 100)

hub := util.NewHub()

go func(l net.Listener) {
    for {
        conn, err := l.Accept()
        if err != nil {
            .. do something ..
            return
        }
        conns <- conn
    }
}(listener)

for {
    select {
    case c := <-conns:
        hub.Register(conn)
    }
}
hub.Chat()
daenney commented 4 years ago

Oh mm, the Accept is on dtls.Listener, so that triggers the whole handshake machinery. We should be able to Accept() on the underlying listener first and then spin the rest off into a goroutine? I suppose the problem is this: https://github.com/pion/dtls/blob/c6dbd482fa3a0a7655e9dc338a0f15be905cdf35/listener.go#L65? We do this inline, instead of spawning a goroutine there that'll then take care of the handshake?

jkralik commented 4 years ago

@Sean-Der From my point view, spawning goroutine's below to underlayer of dtls/listener, because we want to have similar API as tls.Listener. and it is more convenient for using. Of course there must be option to limit spawned goroutines.

Sean-Der commented 4 years ago

@daenney yep exactly!

@jkralik agree!


Server in crypto/tls doesn't actually do anything so we need to match this API. Instead the first call to Read or Write should start the handshake.

Sean-Der commented 4 years ago

It looks like we will need to do a v3 and add Handshake

I can't believe I missed this before :( this is a pretty big breaking change, but I don't think it will be difficult to do!

boaks commented 4 years ago

For reference to anyone reading: In most applications this wouldn't be that much of an issue since the handshake over an internet connection goes relatively quickly, so context deadlines can be configured to be very tight (e.g. 100 ms). With embedded devices, the full handshake generally takes 5 seconds, which is much more blocking even if it doesn't hang and 500 devices are trying to make connection at the same time.

I'm not sure, when exactly the spawned goroutine is required. If this is already required for RFC 6347 - 4.2.1. Denial-of-Service Countermeasures, I think it violates the "In order to counter both of these attacks, DTLS borrows the stateless cookie technique used by Photuris [PHOTURIS] and IKE [IKEv2]." there. Then each (spoofed) CLIENT_HELLO may trigger/consume such a goroutine. Even if that number is limited, it will make it easy for spoofer to perform a DoS attack.

Sean-Der commented 4 years ago

I'm not sure, when exactly the spawned goroutine is required.

@boaks Each Accept call would be in one. You would do a pre-allocated pool, so I don't think the initial ClientHello will cause a problem!

The API difference is that crypto/tls does the handshake on the first call to Read or Write. pion/dtls does it on the call to Accept. All the blocking occurs in Read and Write which users usually move to a goroutine.


If users do a pool and call Accept multiple times they can handle multiple handshakes at once. That is the best option to unblock right away.


Going forward we have two choices to make the API behave exactly like crypto/tls

One thing that worries me is that this could encourage users to spawn a goroutine each time a ClientHello is received. The user will do the call to Read/Write/Handshake in a new goroutine.

daenney commented 3 years ago

If we were to take option 1, realistically we'd want that option on/by default since the current behaviour is undesirable. But we can't flip that on by default since it changes our behaviour quite a bit and we don't know if folks are relying on it somehow. That would result in us having to do a major anyway to be able to flip the default for that option, at which point I'd rather we break the API and align on crypto/tls than take on the tech debt.

niondir commented 2 years ago

Is there any progress on this? We are accepting DTLS connections of potentially many thousand clients (IoT devices) on low bandwidth connections with high timeouts - which is worst case for this scenario ....

boaks commented 2 years ago

My experience with (too) many coap and/or dtls stacks is, it's either intended and developed to be used for IoT, or you may be faced much more pain, then you expect. That doesn't mean, other implementations are bad, it just means, they are not really supporting IoT.

Sean-Der commented 2 years ago

Hi @niondir

Instead of doing your Handshake or first Read/Write in a goroutine create your dtls/Conn in one. Even when we change the behavior to match crypto/TLS you still have to handle the concurrent handshakes.

Happy to create a JSFiddle showing how to do this. When this issue is closed though I don’t think this library will change dramatically, we are just moving when something blocks.

thanks!

jdbruijn commented 2 years ago

FWIW I'm doing like example below in our DTLS server for IoT devices to create 1000 handshake workers/listeners, also using go-coap. Has been working great for over a year now.

func (l *DTLSListener) acceptLoop() {
    defer l.wg.Done()
    for i := 0; i < 1000; i++ {
        go func() {
            for {
                conn, err := l.listener.Accept()
                select {
                case l.connCh <- connData{conn: conn, err: err}:
                case <-l.doneCh:
                    return
                }
            }
        }()
    }
 }
jordan-bonecutter commented 5 months ago

This is a pretty big issue. From my experience, 1 bad handshaker and a wonky config can pretty much take down the entire server (default 30s handshake timeout, client may have a much shorter one like 5s. Client handshakes build up infinitely.)

I will submit a PR, but I propose:

type Handshaker interface {
  Handshake() (net.Conn, error)
}

type handshaker struct {
  inner net.Conn
  conf *Config
}

func(h handshaker) Handshake() (net.Conn, error) {
  return Server(h.inner, h.conf)
}

func(ln *listener) AcceptHandshaker() (Handshaker, error) {
  inner, err := ln.parent.Accept()
  return handshaker{inner, ln.config}, err
}

The Handshake method can then be called in a background goroutine.