libp2p / go-libp2p

libp2p implementation in Go
MIT License
5.97k stars 1.05k forks source link

peer connections limit #2220

Closed SOVLOOKUP closed 1 year ago

SOVLOOKUP commented 1 year ago

When the number of connected peers increases to 120+, node will refuse new connections with an error:

panic: failed to dial 12D3KooWSd5c6vcozontrsQYpATbXzJUZGGqyC7DuNVeJPkpDdzo:
  * [/ip4/192.168.150.40/tcp/34787] dial backoff

The error can be reproduced with:https://github.com/SOVLOOKUP/yunos/tree/libp2p

// https://github.com/SOVLOOKUP/yunos/blob/libp2p/main.go
func main() {
    ctx := context.Background()
    s, err := libp2p.New()
    if err != nil {
        panic(err)
    }
    yuns := server.New(ctx, &s)

    for i := 0; i < 1000; i++ {
        time.Sleep(time.Second / 10)
        go func() {
            err := newClient(yuns.Addr())
            if err != nil {
                panic(err)
            }
        }()
    }
}
Version Information
➜  yunos git:(libp2p) ✗ go list -m all
github.com/sovlookup/yunos
cloud.google.com/go v0.65.0
cloud.google.com/go/bigquery v1.8.0
cloud.google.com/go/datastore v1.1.0
cloud.google.com/go/pubsub v1.3.1
cloud.google.com/go/storage v1.10.0
dmitri.shuralyov.com/app/changes v0.0.0-20180602232624-0a106ad413e3
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9
dmitri.shuralyov.com/html/belt v0.0.0-20180602232347-f7d459c86be0
dmitri.shuralyov.com/service/change v0.0.0-20181023043359-a85b471d5412
dmitri.shuralyov.com/state v0.0.0-20180228185332-28bcc343414c
git.apache.org/thrift.git v0.0.0-20180902110319-2566ecd5d999
github.com/AndreasBriese/bbloom v0.0.0-20190825152654-46b345b51c96
github.com/BurntSushi/toml v0.3.1
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802
github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751
github.com/alecthomas/units v0.0.0-20190924025748-f65c72e2690d
github.com/andybalholm/brotli v1.0.4
github.com/anmitsu/go-shlex v0.0.0-20161002113705-648efa622239
github.com/benbjohnson/clock v1.3.0
github.com/beorn7/perks v1.0.1
github.com/bradfitz/go-smtpd v0.0.0-20170404230938-deb6d6237625
github.com/buger/jsonparser v0.0.0-20181115193947-bf1c66bbce23
github.com/census-instrumentation/opencensus-proto v0.2.1
github.com/cespare/xxhash v1.1.0
github.com/cespare/xxhash/v2 v2.2.0
github.com/chzyer/logex v1.1.10
github.com/chzyer/readline v1.5.0
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1
github.com/cilium/ebpf v0.4.0
github.com/client9/misspell v0.3.4
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f
github.com/containerd/cgroups v1.0.4
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d
github.com/coreos/go-systemd/v22 v22.5.0
github.com/cpuguy83/go-md2man/v2 v2.0.0
github.com/davecgh/go-spew v1.1.1
github.com/davidlazar/go-crypto v0.0.0-20200604182044-b73af7476f6c
github.com/decred/dcrd/crypto/blake256 v1.0.0
github.com/decred/dcrd/dcrec/secp256k1/v4 v4.1.0
github.com/dgraph-io/badger v1.6.2
github.com/dgraph-io/ristretto v0.0.2
github.com/docker/go-units v0.5.0
github.com/duke-git/lancet/v2 v2.1.16
github.com/dustin/go-humanize v1.0.0
github.com/elastic/gosigar v0.14.2
github.com/envoyproxy/go-control-plane v0.9.4
github.com/envoyproxy/protoc-gen-validate v0.1.0
github.com/fasthttp/websocket v1.5.1
github.com/flynn/go-shlex v0.0.0-20150515145356-3f9db97f8568
github.com/flynn/noise v1.0.0
github.com/francoispqt/gojay v1.2.13
github.com/fsnotify/fsnotify v1.5.4
github.com/ghodss/yaml v1.0.0
github.com/gin-contrib/sse v0.1.0
github.com/gin-gonic/gin v1.6.3
github.com/gliderlabs/ssh v0.1.1
github.com/go-errors/errors v1.0.1
github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20200222043503-6f7a984d4dc4
github.com/go-kit/kit v0.9.0
github.com/go-kit/log v0.2.0
github.com/go-logfmt/logfmt v0.5.1
github.com/go-logr/logr v1.2.3
github.com/go-playground/assert/v2 v2.0.1
github.com/go-playground/locales v0.13.0
github.com/go-playground/universal-translator v0.17.0
github.com/go-playground/validator/v10 v10.2.0
github.com/go-stack/stack v1.8.0
github.com/go-task/slim-sprig v0.0.0-20210107165309-348f09dbbbc0
github.com/gobwas/httphead v0.0.0-20180130184737-2c6c146eadee
github.com/gobwas/pool v0.2.0
github.com/gobwas/ws v1.0.2
github.com/godbus/dbus/v5 v5.1.0
github.com/gofiber/fiber/v2 v2.42.0
github.com/gofiber/websocket/v2 v2.1.4
github.com/gogo/protobuf v1.3.2
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e
github.com/golang/lint v0.0.0-20180702182130-06c8688daad7
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.2
github.com/golang/snappy v0.0.0-20180518054509-2e65f85255db
github.com/google/btree v1.0.0
github.com/google/go-cmp v0.5.9
github.com/google/go-github v17.0.0+incompatible
github.com/google/go-querystring v1.0.0
github.com/google/gofuzz v1.0.0
github.com/google/gopacket v1.1.19
github.com/google/martian v2.1.0+incompatible
github.com/google/martian/v3 v3.0.0
github.com/google/pprof v0.0.0-20221203041831-ce31453925ec
github.com/google/renameio v0.1.0
github.com/google/uuid v1.3.0
github.com/googleapis/gax-go v2.0.0+incompatible
github.com/googleapis/gax-go/v2 v2.0.5
github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1
github.com/gorilla/websocket v1.4.1
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7
github.com/grpc-ecosystem/grpc-gateway v1.5.0
github.com/hashicorp/golang-lru v0.5.1
github.com/hashicorp/golang-lru/v2 v2.0.1
github.com/huin/goupnp v1.0.3
github.com/huin/goutil v0.0.0-20170803182201-1ca381bf3150
github.com/ianlancetaylor/demangle v0.0.0-20220319035150-800ac71e25c2
github.com/ipfs/go-cid v0.3.2
github.com/ipfs/go-datastore v0.6.0
github.com/ipfs/go-detect-race v0.0.1
github.com/ipfs/go-ds-badger v0.3.0
github.com/ipfs/go-ds-leveldb v0.5.0
github.com/ipfs/go-ipfs-util v0.0.2
github.com/ipfs/go-log/v2 v2.5.1
github.com/jackpal/go-nat-pmp v1.0.2
github.com/jbenet/go-temp-err-catcher v0.1.0
github.com/jbenet/goprocess v0.1.4
github.com/jellevandenhooff/dkim v0.0.0-20150330215556-f50fe3d243e1
github.com/jpillora/backoff v1.0.0
github.com/json-iterator/go v1.1.12
github.com/jstemmer/go-junit-report v0.9.1
github.com/julienschmidt/httprouter v1.3.0
github.com/kisielk/errcheck v1.5.0
github.com/kisielk/gotool v1.0.0
github.com/klauspost/compress v1.15.12
github.com/klauspost/cpuid/v2 v2.2.1
github.com/konsorten/go-windows-terminal-sequences v1.0.3
github.com/koron/go-ssdp v0.0.3
github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515
github.com/kr/pretty v0.2.1
github.com/kr/pty v1.1.3
github.com/kr/text v0.1.0
github.com/leodido/go-urn v1.2.0
github.com/libp2p/go-buffer-pool v0.1.0
github.com/libp2p/go-cidranger v1.1.0
github.com/libp2p/go-flow-metrics v0.1.0
github.com/libp2p/go-libp2p v0.26.3
github.com/libp2p/go-libp2p-asn-util v0.2.0
github.com/libp2p/go-libp2p-testing v0.12.0
github.com/libp2p/go-mplex v0.7.0
github.com/libp2p/go-msgio v0.3.0
github.com/libp2p/go-nat v0.1.0
github.com/libp2p/go-netroute v0.2.1
github.com/libp2p/go-reuseport v0.2.0
github.com/libp2p/go-sockaddr v0.0.2
github.com/libp2p/go-yamux/v4 v4.0.0
github.com/libp2p/zeroconf/v2 v2.2.0
github.com/lunixbochs/vtclean v1.0.0
github.com/mailru/easyjson v0.0.0-20190312143242-1de009706dbe
github.com/marten-seemann/tcp v0.0.0-20210406111302-dfbc87cc63fd
github.com/mattn/go-colorable v0.1.13
github.com/mattn/go-isatty v0.0.17
github.com/mattn/go-runewidth v0.0.14
github.com/matttproud/golang_protobuf_extensions v1.0.4
github.com/microcosm-cc/bluemonday v1.0.1
github.com/miekg/dns v1.1.50
github.com/mikioh/tcp v0.0.0-20190314235350-803a9b46060c
github.com/mikioh/tcpinfo v0.0.0-20190314235526-30a79bb1804b
github.com/mikioh/tcpopt v0.0.0-20190314235656-172688c1accc
github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1
github.com/minio/sha256-simd v1.0.0
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
github.com/modern-go/reflect2 v1.0.2
github.com/mr-tron/base58 v1.2.0
github.com/multiformats/go-base32 v0.1.0
github.com/multiformats/go-base36 v0.2.0
github.com/multiformats/go-multiaddr v0.8.0
github.com/multiformats/go-multiaddr-dns v0.3.1
github.com/multiformats/go-multiaddr-fmt v0.1.0
github.com/multiformats/go-multibase v0.1.1
github.com/multiformats/go-multicodec v0.7.0
github.com/multiformats/go-multihash v0.2.1
github.com/multiformats/go-multistream v0.4.1
github.com/multiformats/go-varint v0.0.7
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f
github.com/neelance/astrewrite v0.0.0-20160511093645-99348263ae86
github.com/neelance/sourcemap v0.0.0-20151028013722-8c68805598ab
github.com/onsi/ginkgo v1.16.5
github.com/onsi/ginkgo/v2 v2.5.1
github.com/onsi/gomega v1.24.0
github.com/opencontainers/runtime-spec v1.0.2
github.com/openzipkin/zipkin-go v0.1.1
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58
github.com/philhofer/fwd v1.1.1
github.com/pkg/errors v0.9.1
github.com/pmezard/go-difflib v1.0.0
github.com/prometheus/client_golang v1.14.0
github.com/prometheus/client_model v0.3.0
github.com/prometheus/common v0.37.0
github.com/prometheus/procfs v0.8.0
github.com/quic-go/qpack v0.4.0
github.com/quic-go/qtls-go1-18 v0.2.0
github.com/quic-go/qtls-go1-19 v0.2.1
github.com/quic-go/qtls-go1-20 v0.1.1
github.com/quic-go/quic-go v0.33.0
github.com/quic-go/webtransport-go v0.5.2
github.com/raulk/go-watchdog v1.3.0
github.com/rivo/uniseg v0.2.0
github.com/rogpeppe/go-internal v1.3.0
github.com/russross/blackfriday v1.5.2
github.com/russross/blackfriday/v2 v2.0.1
github.com/savsgio/dictpool v0.0.0-20221023140959-7bf2e61cea94
github.com/savsgio/gotils v0.0.0-20220530130905-52f3993e8d6d
github.com/sergi/go-diff v1.0.0
github.com/shurcooL/component v0.0.0-20170202220835-f88ec8f54cc4
github.com/shurcooL/events v0.0.0-20181021180414-410e4ca65f48
github.com/shurcooL/github_flavored_markdown v0.0.0-20181002035957-2122de532470
github.com/shurcooL/go v0.0.0-20180423040247-9e1955d9fb6e
github.com/shurcooL/go-goon v0.0.0-20170922171312-37c2f522c041
github.com/shurcooL/gofontwoff v0.0.0-20180329035133-29b52fc0a18d
github.com/shurcooL/gopherjslib v0.0.0-20160914041154-feb6d3990c2c
github.com/shurcooL/highlight_diff v0.0.0-20170515013008-09bb4053de1b
github.com/shurcooL/highlight_go v0.0.0-20181028180052-98c3abbbae20
github.com/shurcooL/home v0.0.0-20181020052607-80b7ffcb30f9
github.com/shurcooL/htmlg v0.0.0-20170918183704-d01228ac9e50
github.com/shurcooL/httperror v0.0.0-20170206035902-86b7830d14cc
github.com/shurcooL/httpfs v0.0.0-20171119174359-809beceb2371
github.com/shurcooL/httpgzip v0.0.0-20180522190206-b1c53ac65af9
github.com/shurcooL/issues v0.0.0-20181008053335-6292fdc1e191
github.com/shurcooL/issuesapp v0.0.0-20180602232740-048589ce2241
github.com/shurcooL/notifications v0.0.0-20181007000457-627ab5aea122
github.com/shurcooL/octicon v0.0.0-20181028054416-fa4f57f9efb2
github.com/shurcooL/reactions v0.0.0-20181006231557-f2e0b4ca5b82
github.com/shurcooL/sanitized_anchor_name v1.0.0
github.com/shurcooL/users v0.0.0-20180125191416-49c67e49c537
github.com/shurcooL/webdavfs v0.0.0-20170829043945-18c3829fa133
github.com/sirupsen/logrus v1.8.1
github.com/sourcegraph/annotate v0.0.0-20160123013949-f4cad6c6324d
github.com/sourcegraph/jsonrpc2 v0.2.0
github.com/sourcegraph/syntaxhighlight v0.0.0-20170531221838-bd320f5d308e
github.com/spaolacci/murmur3 v1.1.0
github.com/stretchr/objx v0.1.1
github.com/stretchr/testify v1.8.1
github.com/syndtr/goleveldb v1.0.0
github.com/tarm/serial v0.0.0-20180830185346-98f6abe2eb07
github.com/tinylib/msgp v1.1.6
github.com/ugorji/go v1.1.7
github.com/ugorji/go/codec v1.1.7
github.com/urfave/cli v1.22.2
github.com/valyala/bytebufferpool v1.0.0
github.com/valyala/fasthttp v1.44.0
github.com/valyala/tcplisten v1.0.0
github.com/viant/assertly v0.4.8
github.com/viant/toolbox v0.24.0
github.com/yuin/goldmark v1.4.13
go.opencensus.io v0.22.4
go.uber.org/atomic v1.10.0
go.uber.org/dig v1.15.0
go.uber.org/fx v1.18.2
go.uber.org/goleak v1.1.12
go.uber.org/multierr v1.8.0
go.uber.org/zap v1.24.0
go4.org v0.0.0-20180809161055-417644f6feb5
golang.org/x/build v0.0.0-20190111050920-041ab4dc3f9d
golang.org/x/crypto v0.4.0
golang.org/x/exp v0.0.0-20221208152030-732eee02a75a
golang.org/x/image v0.0.0-20190802002840-cff245a6509b
golang.org/x/lint v0.0.0-20200302205851-738671d3881b
golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028
golang.org/x/mod v0.7.0
golang.org/x/net v0.4.0
golang.org/x/oauth2 v0.0.0-20220223155221-ee480838109b
golang.org/x/perf v0.0.0-20180704124530-6e6d33e29852
golang.org/x/sync v0.1.0
golang.org/x/sys v0.3.0
golang.org/x/term v0.3.0
golang.org/x/text v0.5.0
golang.org/x/time v0.0.0-20191024005414-555d28b269f0
golang.org/x/tools v0.3.0
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
google.golang.org/api v0.30.0
google.golang.org/appengine v1.6.6
google.golang.org/genproto v0.0.0-20200825200019-8632dd797987
google.golang.org/grpc v1.31.0
google.golang.org/protobuf v1.28.1
gopkg.in/alecthomas/kingpin.v2 v2.2.6
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
gopkg.in/errgo.v2 v2.1.0
gopkg.in/inf.v0 v0.9.1
gopkg.in/yaml.v2 v2.4.0
gopkg.in/yaml.v3 v3.0.1
grpc.go4.org v0.0.0-20170609214715-11d0a25b4919
honnef.co/go/tools v0.0.1-2020.1.4
lukechampine.com/blake3 v1.1.7
nhooyr.io/websocket v1.8.7
rsc.io/binaryregexp v0.2.0
rsc.io/quote/v3 v3.1.0
rsc.io/sampler v1.3.0
sourcegraph.com/sourcegraph/go-diff v0.5.0
sourcegraph.com/sqs/pbtypes v0.0.0-20180604144634-d3ebe8f20ae4
vernonr3 commented 1 year ago

I've reproduced the problem - after some 190 clients are spawned. The error seems to vary. With a compiled version it was as SOVLOOKUP reported. Debugging to try and find the source of the error - I've seen "received data 503", "dial backoff". My suspicion is that the error is ultimately coming from go-ipfs - and it's "point of entry" is in p2p/host/basic/basic_host.go - line 736 in the dialPeer method. Having said that I'm unsure whether it's a valid use case? Should a tight loop be able to spawn hundreds/thousands of clients without pause or error?

MarcoPolo commented 1 year ago

I took your repro and used stock go-libp2p parts and I can't repro. Maybe you're hitting into your resource limits set by the resource manager?

Does the following also fail for you?

https://gist.github.com/MarcoPolo/ab646ca5e8853a560a94ce7727daacce

SOVLOOKUP commented 1 year ago

@vernonr3 In the real-world application, my device may communicate with hundreds/thousands of peers at the same time. I want to test the upper limit of peers that libp2p can connect to.

After running the example given by @MarcoPolo, I found that libp2p seems to be able to connect to a maximum of about 1000 peers.

If you change the limit of 1024 to 2048, the program will throw a panic: stream reset error when the number of connections reaches about 1000

    limits := rcmgr.PartialLimitConfig{
        System: rcmgr.ResourceLimits{
            Conns:        2048,
            ConnsInbound: 2048,
        },
        Transient: rcmgr.ResourceLimits{
            Conns:        2048,
            ConnsInbound: 2048,
        },
    }
vernonr3 commented 1 year ago

@SOVLOOKUP Can we separate the number of peers to which your device "may" communicate from the speed at which connections are created? Your sample code had a delay of 1/10 second between spawning goroutines. What made you choose this particular value? I'm finding that the time delay between connections being created seems to have an effect on what happens. I'm aware this might well be hardware/OS specific.

Wondertan commented 1 year ago

Celestia observes a similar issue on the production network with ~1000 peers. On start, our nodes connect to bootstrappers, and some cannot connect due to dial backoff. We have infinite resource limits and a high connection manager threshold on them, so why that would happen is unclear.

vernonr3 commented 1 year ago

I've done further work on this taking @MarcoPolo' s code as a vital basis. There appears to be a problem with rapidly spawning lots of connections. Not always perhaps (it might depend on OS/hardware etc.)

Two branches in my repo (spawnrate folder) worth looking at:

a) Pattern of missing connections... https://github.com/vernonr3/libp2p_tests/tree/inside_goroutine/spawnrate This adds instrumentation and parameterizes a delay inside a goroutine before creating connections. The "interesting results" (piping stdout to file) can be seen in MissingDelay150.txt, MissingDelay200.txt and MissingDelay250.txt. The "missing connections" are those which approach each multiple of the delay factor. (see commit eb285ca for further details). This doesn't "feel right". Something nasty is happening...

b) How many connections are missed and why https://github.com/vernonr3/libp2p_tests/tree/outside_goroutine/spawnrate Here the delay has been moved outside the goroutine. This stops the bunching that happened in case a - and produced a more even spawn rate.

Counting the number of "start dialing" vs "finished dialing" debug messages in DialingCompare4.txt (these debug messages are from inside p2p/host/basic/basic_host.go - see commit 2f42cfc for more details of what I did ). My belief (based on documentation) is that the ProtocolIdentify (handshake3) isn't completing correctly in the missing cases. Don't know why as yet..

vernonr3 commented 1 year ago

Further work...

See spawnrate2

The output is in the text file output.txt

Added extra logging for a local copy of the golibp2p library Experimented... Findings thus far:

Closing the client (at the end of newClient) gets rid of the problem Setting the IP listen address deliberately to 127.0.0.1 deliberately improves the situation slightly - it can manage 1000 connections but not many more

There large numbers of ErrReset issued from transport/quic/stream.go in the Read and Write Functions - which are seen as "stream reset" on the client end

At a higher level the error appears within the client call to SelectProtoOrFail. The actual error occurs in quic-go/receive_stream.go _) I think as a result of cancelRead....

Since the server is still alive and clients are also alive (even if quiet after their initial ping) I'm not sure why these cancellations are happening?

Wondertan commented 1 year ago

@vernonr3, relation to SelectProtoOrFail reminds me of the issue my team had. The root of the problem was buggy code on our side. We didn't handle EvtPeerConnectednessChanged and it blocked the Identify protocol on SelectProtoOrFail blocking any other new stream. Double-check your event handling

Wondertan commented 1 year ago

This issue was a problem on our side, again. go-libp2p works perfectly, as always 😺

Celestia observes a similar issue on the production network with ~1000 peers. On start, our nodes connect to bootstrappers, and some cannot connect due to dial backoff. We have infinite resource limits and a high connection manager threshold on them, so why that would happen is unclear.

MarcoPolo commented 1 year ago

@vernonr3 I'm guessing you're hitting resource manager limits for total number of active streams for ping. The resource manager is meant to protect you from getting dos'd but if you want to see how many connections you can spin up to your self as fast as possible try this diff:

diff --git a/spawnrate2/main.go b/spawnrate2/main.go
index e71afb2..33aaa95 100644
--- a/spawnrate2/main.go
+++ b/spawnrate2/main.go
@@ -6,44 +6,33 @@ import (
    "fmt"
    "sync"
    "sync/atomic"
-   "time"

    logging "github.com/ipfs/go-log/v2"
    libp2p "github.com/libp2p/go-libp2p"
    "github.com/libp2p/go-libp2p/core/host"
+   "github.com/libp2p/go-libp2p/core/network"
    "github.com/libp2p/go-libp2p/core/peer"
-   rcmgr "github.com/libp2p/go-libp2p/p2p/host/resource-manager"
    "github.com/libp2p/go-libp2p/p2p/net/connmgr"
    "github.com/libp2p/go-libp2p/p2p/protocol/ping"
 )

 func setupLogging() {
-   err := logging.SetLogLevel("ping", "debug")
-   if err != nil {
-       panic(err)
-   }
-   err = logging.SetLogLevel("upgrader", "debug")
-   if err != nil {
-       panic(err)
-   }
-   err = logging.SetLogLevel("quic-transport", "debug")
-   if err != nil {
-       panic(err)
-   }
-   err = logging.SetLogLevel("webtransport", "debug")
+   var err error
+   err = logging.SetLogLevel("quic-transport", "warn")
    if err != nil {
        panic(err)
    }
-
 }

 func controlclients(s host.Host, clients int) {
+   sem := make(chan struct{}, 8)
    n := atomic.Int32{}
    wg := sync.WaitGroup{}
    for i := 0; i < clients; i++ {
        wg.Add(1)
        go func(i int) {
-           time.Sleep(time.Duration(i%100) * 100 * time.Millisecond)
+           sem <- struct{}{}
+           defer func() { <-sem }()
            defer wg.Done()
            err := newClient(peer.AddrInfo{
                ID:    s.ID(),
@@ -68,30 +57,7 @@ func main() {
    clients := flag.Int("clients", 1000, "Set number of clients")
    flag.Parse()
    fmt.Printf("Running with setip=%v and number of clients %d\n", *setip, *clients)
-   limits := rcmgr.PartialLimitConfig{
-       System: rcmgr.ResourceLimits{
-           Streams:         rcmgr.LimitVal((*clients) * 2),
-           StreamsInbound:  rcmgr.LimitVal((*clients) * 2),
-           StreamsOutbound: rcmgr.LimitVal((*clients) * 2),
-           Conns:           rcmgr.LimitVal((*clients) * 2),
-           ConnsInbound:    rcmgr.LimitVal((*clients) * 2),
-           ConnsOutbound:   rcmgr.LimitVal((*clients) * 2),
-       },
-       Transient: rcmgr.ResourceLimits{
-           Streams:         rcmgr.LimitVal((*clients) * 2),
-           StreamsInbound:  rcmgr.LimitVal((*clients) * 2),
-           StreamsOutbound: rcmgr.LimitVal((*clients) * 2),
-           Conns:           rcmgr.LimitVal((*clients) * 2),
-           ConnsInbound:    rcmgr.LimitVal((*clients) * 2),
-           ConnsOutbound:   rcmgr.LimitVal((*clients) * 2),
-       },
-   }
-
-   limiter := rcmgr.NewFixedLimiter(limits.Build(rcmgr.DefaultLimits.AutoScale()))
-   rmgr, err := rcmgr.NewResourceManager(limiter)
-   if err != nil {
-       panic(err)
-   }
+   rmgr := &network.NullResourceManager{}

    cmgr, err := connmgr.NewConnManager(0, *clients)
    if err != nil {

This disables the resource manager and uses a semaphore to limit the number of new concurrent connection requests. You want to limit this since there's a pending queue of connections before they're accepted in go-libp2p and to avoid a dos vector we cap this queue. If you don't accept from the queue, then future connections will be dropped.

again, you probably don't want to use a null resource manager in production. And in a real network you aren't going to be getting a ton of concurrent requests at once (unless you're getting dos'd or you know what you're doing).

vernonr3 commented 1 year ago

Thank you very much @Marco for this suggestion..

Having pursued it, I've discovered the following:

Replacing the ResourceManager by the null resource manager and a limiting semaphore as you suggest makes a huge difference.

In the libp2p triage discussion last night

Marten Seeman confirmed that there is a fair amount of crypto involved in the quic handshake. This is required to accept a connection. @Marco stated that the performance on a single machine (i.e. connecting to localhost) is much different to that across a real network. The latter would suffer from latencies etc. that would even out some of the peak loading experienced on a single host.

The team therefore propose to close this issue @SOVLOOKUP at present. They would be happy to reopen it should problems arise in production, and a more detailed use case be provided.