paritytech / substrate-telemetry

Polkadot Telemetry service
GNU General Public License v3.0
302 stars 206 forks source link

How should we group nodes into chains on the backend/UI? #373

Open jsdw opened 2 years ago

jsdw commented 2 years ago

Currently, we group nodes into chains on the backend by their genesis hash. We pick the chain name to display by taking the most commonly seen one for a given genesis hash.

It was pointed out in https://github.com/paritytech/substrate-telemetry/issues/350#issuecomment-899523779 that grouping by genesis hash wouldn't accomodate forks properly.

Should we group nodes into chains solely by the chain name given? This may actually simplify some backend logic (we don't need to find the most common name or handle chain renames; we just assume that different chain names always mean different chains, irrespective of genesis hash)

Is there something else we should consider?

I'd be keen to hear what others think about this!

dvdplm commented 2 years ago

Not sure there is a perfect solution to this. Chains do change names (surprisingly often).

Back this spring when we rolled out a cap on the # of nodes for third party chains we saw one chain change its name three times in a (vain) attempt to work around the cap (I guess they figured we had a denylist rather than an allowlist).

The scenario where two distinct chains share the same genesis hash is a tricky one. We'd have to either bless one of the two or introduce a way to mark certain block heights as "special" after the fact, and maintain a mapping to determine which fork a node is on. I could see that going all kinds of wrong though. :/

jsdw commented 2 years ago

It's a tricky one, by the sounds of it! I can see pros and cons of each. I'm not sure what the best choice is on balance.

Should chains be grouped by both their genesis hash and name, I wonder. If some nodes change the chain name, we would just assume that they aren't related in the UI (maybe it's a fork situation), but genesis hash helps prevent nodes from different chains from getting mingled together in the UI grouping (for malicious or other reasons).

Of course, if nodes belonging to the same chain frequently advertise different chain names from each other (does this happen?), that would lead to a lot of false separate groups.

We could probably expose some sort of metrics at least to keep track of the chain names we see for each genesis hash, to get a feel for what things are like "in the wild", which might make it easier to decide what (if anything) to do.

dvdplm commented 2 years ago

Of course, if nodes belonging to the same chain frequently advertise different chain names from each other (does this happen?), that would lead to a lot of false separate groups.

Yes, this happens. I wouldn't say it's frequent but it happens. One example that comes to mind is "forgotten nodes" spun up by someone at some point and then never decomissioned and never updated, so there has been "Kusama-CC" nodes floating around (I think that was the name, from the very very early days, sort of a pre-release?). Same genesis, old name.

I don't have a very good suggestion on what to do here. It's tempting to put this off until we have to deal with it.

@maciejhirsz thoughts?

grenade commented 2 years ago

just to muddy the waters further, we (manta.network) run a few different relay chain environments that host different blockchains of the same parachain definition/genesis. eg:

currently this results in both Baikal/Calamari and Como/Calamari showing up on the same telemetry tab, which is unfortunate since average block time calculations are meaningless in this scenario.

the only solution i have found is to run different telemetry services for each relay/para set.

i don't know what a better solution would look like, but i wanted to offer this use case as an example of how current telemetry behaves in the wild in the hopes that we can fix it.

if a parachain's relay id is available to telemetry (idk if this is the case), it would be nice if a composite relay/para identifier could segregate the telemetry data.