paritytech / substrate-telemetry

Polkadot Telemetry service
GNU General Public License v3.0
314 stars 209 forks source link

Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit: Custom { kind: Other, error: Timeout } #471

Open imstar15 opened 2 years ago

imstar15 commented 2 years ago

My running node is getting the following error:

2022-05-19 14:13:16 [Parachain] ❌ Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }

2022-05-19 14:13:16 failed to associate send_message response to the sender

Environment: Node Provider: onfinality Cloud Provider: AWS Region: Tokyo Node Type: Fullnode Syncing status: Synced Launch params: --telemetry-url='wss://telemetry.polkadot.io/submit 0

I searched for this article and it said the problem is related to the telemetry server's rate limit? https://forum.subspace.network/t/error-while-dialing-dns-telemetry-polkadot-io-tcp-443-x-parity-wss-submit-custom-kind-other-error-timeout/45

How can I solve this issue please?

Thanks!

imstar15 commented 2 years ago

I just looked at telemetry and the node is on it.

Why does such an unstable state occur?

jsdw commented 2 years ago

Do you get this error once (sometimes) when first starting up your node, or consistently?

I've seen this in the past occasionally, and I've no idea why it happens, but the node always connects successfully on the next attempt when the error does appear. I have tended to put it down to something substrate related, since I have never had issues connecting to telemetry in tests, but I'm not sure.

It is true that telemetry will not show more than a certain number (1000) of nodes for most chains, and so you would be limited in that case.

It would be nice to know why this error occasionally pops up in substrate/polkadot, but it doesn't actually lead to any issues if it's the occasional when-node-is-first-started error I've seen.

irsal commented 2 years ago

Hey @jsdw - we've only seen this with this service provider, which they relayed that it might be a rate limiter issue with telemetry.

We are not limited by the 1000 number.

imstar15 commented 2 years ago

These nodes still have timeout errors from time to time.

Please help.

jsdw commented 2 years ago

@irsal what do you mean by "this service provider"? Telemetry doesn't limit connections beyond the 1000-node-per-non-whitelisted-chain number (though if the bandwidth of a connection is unexpectedly high it will also kill the connection iirc, because it indicates some issue (unintentional or otherwise) with the connection.

@imstar15 do the nodes disappear and reappear in telemetry or remain there throughout? How often do the timeout errors occur? I'll need a bunch more information I think to really be able to help. Would you be able to provide some steps to reproduce what you are seeing?

imstar15 commented 2 years ago

@jsdw. Thanks! These nodes will be in a state of disappearance for a long time. Currently I don't know the steps to reproduce, I tried to start new node and it communicates smoothly with telemetry. Those nodes that have been running for a long time will continue to have timeout errors.

If I have any other news, I will let you know.

irsal commented 2 years ago

We'll close this out unless we have any additional context to provide.

jsdw commented 2 years ago

Thanks! Will close this unless any further information comes in. I'm not sure whether it's related to telemetry or Substrate at present (or just network issues in general). If you guys manage to find a way to reproduce it, please let me know!

ltfschoen commented 1 year ago

I've suggest re-opeining this. I've created a reproducable example of this error Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }. It happens everytime I run the command in the "Run Cargo Contract Node" section of the README of https://github.com/ltfschoen/InkTest with commit d7c8a98 where i run the following. it should be reproducable since it's running in a Docker container that is based on Parity docker containers and uses specific versions with docker. I've configured network_mode: host in the docker-compose.yml file so all ports should be exposed in the docker container like on the host machine:

jsdw commented 7 months ago

I'll re-open since this seems to be an ongoing thing, though we don't have much bandwidth to look into it right now I'm afraid!