paritytech / polkadot

Polkadot Node Implementation
GNU General Public License v3.0
7.13k stars 1.58k forks source link

—peers-in/out is not limiting connections #6008

Closed dcolley closed 2 years ago

dcolley commented 2 years ago

On my Kusama validator we often have 2500+ (3000-4000 while in active set) network connections.

These are visible with “sudo netstat -tuWan”. Add “| wc -l” to count the lines. Less than 20 connections are internal (Prometheus etc).

I added the —peers-in=25 and —peers-out=25 but this has no affect on the actual number of connections.

2 questions:

dcolley commented 2 years ago

Example output from netstat:

tcp        0      0 x.x.x.x:30333     80.190.132.234:22482    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:12677    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:64939    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:41125    ESTABLISHED
tcp        0    354 x.x.x.x:30333     80.190.132.234:4301     ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:55633    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:29542    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:51829    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:25726    ESTABLISHED
tcp       62      0 x.x.x.x:30333     80.190.132.234:57693    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:49161    ESTABLISHED
aperture-sandi commented 2 years ago

We have seen as many as 50+ active connections from a single IP address. (see "code snippet" below")

Some guidance from the devs as to how may requests per second per IP address is acceptable. As well as how many simultaneous connections per IP is acceptable. Knowing this we can at least set up some filters.

sudo netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 x.x.x.x:30333     80.190.132.234:22482    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:12677    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:64939    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:41125    ESTABLISHED
tcp        0    354 x.x.x.x:30333     80.190.132.234:4301     ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:55633    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:29542    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:51829    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:25726    ESTABLISHED
tcp       62      0 x.x.x.x:30333     80.190.132.234:57693    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:49161    ESTABLISHED
aperture-sandi commented 2 years ago

Not sure if this is related but we are seeing this in the service logs as well... Incoming substream from 12D3KooWJiAsAEPfo6r13xECCjxR1GNRqsRJ9o7mgb2E1Xx4198C exceeding maximum number of negotiating inbound streams 2048 on connection. Dropping. See PoolConfig::with_max_negotiating_inbound_streams.

dcolley commented 2 years ago

Is there any update on this please?

As reported by another validator operator, this seem like a DDOS attack? We don't think it's valid for a single IP to have multiple connections into our validators.

sudo netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 x.x.x.x:30333     80.190.132.234:22482    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:12677    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:64939    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:41125    ESTABLISHED
tcp        0    354 x.x.x.x:30333     80.190.132.234:4301     ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:55633    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:29542    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:51829    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:25726    ESTABLISHED
tcp       62      0 x.x.x.x:30333     80.190.132.234:57693    ESTABLISHED
tcp        0      0 x.x.x.x:30333     80.190.132.234:49161    ESTABLISHED
ggwpez commented 2 years ago

cc @niklasad1 (not sure whom to ping here)

niklasad1 commented 2 years ago

/cc @tomaka @kpp

kpp commented 2 years ago

Shame on me, what is --peers-in/out? I can't grep it in the code.

aperture-sandi commented 2 years ago
        --in-peers <COUNT>
            Maximum number of inbound full nodes peers [default: 25]

        --out-peers <COUNT>
            Specify the number of outgoing connections we're trying to maintain [default: 25]
kpp commented 2 years ago

Do you get these logs? https://github.com/paritytech/substrate/blob/7202ca616799a1d78e37ae8ec0093f16c49417c6/client/network/src/protocol/notifications/behaviour.rs#L1022

aperture-sandi commented 2 years ago

I can't find this in our logs

aperture-sandi commented 2 years ago

A user provided this image... The desired peer count graph seems to indicate that the node is seeking to have over 1000 peers at all times. Is always "desiring" 1k+ peers a feature or bug?

Further, the "polkadot_parachain_desired_peer_count" metric is only reported when the validator is in the active set. So we don't have any info on what its intensions are when inactive.

image

bkchr commented 2 years ago

If that is Kusama and a validator, then yes. Each validator is connected to each other validator plus validators of the previous session.

aperture-sandi commented 2 years ago

Thank you, yes, this is a KSM validator. The validator does not appear automatically release these extra peers after they leave the active set. A node restart is required to force the peers back to "inactive" levels. Perhaps this is the issue, that brought up this topic? The fact that the full 1k+ peers linger when inactive?

bkchr commented 2 years ago

Could be that we don't clean this up.

dcolley commented 2 years ago

I think the point is that these connections poss. come from gossip and are not controlled/limited by —peers-in.

ordian commented 2 years ago

We use so called "reserved sets" for parachain-core networking. If your validator is a parachain validator (polkadot_node_is_parachain_validator), it will remain connected for the next 6 sessions (dispute_period).

After that, validators in the active set should remove you from their "reserved set" and disconnect from you. However, your validator will keep it's "reserved set" intact. So it will try connecting the its "reserved set", but there should be only limited amount of slots available in each validator for "non-reserved" peers (11).

Maybe we should change that and empty our reserved set.