Add info about validators being nominated by 1KV accounts

jimiflowers commented 3 months ago

Hi team!

I have exposed this issue in the Kusama Thousand Validators group on matrix, and they have urged me to open this issue.

I have noticed that there are nodes with low scores that are being nominated over other nodes with much higher scores. These are just a few examples, but much more examples can be seen at every change of era:

Examples:

Era: 6,406 Session: 37,822

Nominated nodes with lower score than other nodes not nominated

Name: Madbustazz Addr: GeqoXYixFsP9wLtfjDCrjmtHa47h4LEbxH1B46cBAm5uKq6 Score: 548 points Nominated by: G1rrUNQSk7CjjEmLSGcpNu72tVtyzbWdUvgmSer9eBitXWf

Name: WinterSpring Addr: F5FqpNUCPEnsSme6i8jQ6q2Y7iCaMwKuqQDjcBczXwKyS2A Score: 572 points Nominated by: HgTtJusFEn2gmMmB5wmJDnMRXKD6dzqCpNR7a99kkQ7BNvX

Nodes with higher score and not nominated:

Name: MerkleTribe Ξ Kazaki Addr: HsXaiGQ7cAUctDTnkX535xFamCEnki8XvtnMMMU6LRapePw Score: 864 points Not nominated by 1KV accounts.

Name: MarketAcross-BB/2 Addr: FUDBmPmHvoLnCtxbW55MhyYnhiN33CwtvFYux5CcfSSC8As Score: 737 points Not mominated

As you can see, the difference in points cannot be explained by the "Randomness" multiplier, cos the maximum multiplier is 0.15, and the difference in points is higher than this.

ironoa commented 3 months ago

mmm, where is your data coming from ? None of those you mentioned were nominated during era 6,406. Please consider this (or the chain itself) as a source of truth for the nominations

i.e. GeqoXYixFsP9wLtfjDCrjmtHa47h4LEbxH1B46cBAm5uKq6 was nominated in era 6,405

jimiflowers commented 3 months ago

Hi Alessio.

The info sources were:

PolkadotJS Apps Console (to see nominations) and the 1KV Backend API (https://kusama.w3f.community/nominators) to check if nominator belongs to 1KV programme.
These dashboards to see the score of the nodes: https://1k.hirish.net/kusama & https://vegas1kv.com/

Maybe the problem with the information source is that there is no official dashboard that actually works. Your colleague @Kuba (admin from official "Kusama Thousand Validators" matrix group) said "https://thousand-validators.kusama.network/" is going to be retired because it is being "superseded" by "community initiatives", but I don't really see anything that works properly and gives real time and accurate information. Of course we can program each validator our own dashboard (or fork an existing one) but I think it is detrimental, because the backend is already slow enough in response (by its nature, it must get a lot of info on-chain) to load it with hundreds of requests.

jimiflowers commented 3 months ago

Hi Alessio.

I have done what you tell me (I have taken https://kusama.w3f.community as a source of truth) but I see that there are data that are not offered, such as which are the nominees in the current era. There are nominations from past eras, but not from the current era, so it is difficult to draw conclusions by crossing nomination data from a past era with score data that varies in real time. Can the nomination data from the current era be offered in the "nominations" endpoint?

ironoa commented 3 months ago

Hey Jimi

(I have taken https://kusama.w3f.community/ as a source of truth) but I see that there are data that are not offered, such as which are the nominees in the current era.

As I'm writing we are in era 6421, as you can see the nominees are all there (just scroll to the bottom): https://kusama.w3f.community/nominators

These dashboards to see the score of the nodes: https://1k.hirish.net/kusama & https://vegas1kv.com/

I cannot judge, I'd stick with the source of truth to draw conclusions.

so it is difficult to draw conclusions by crossing nomination data from a past era with score data that varies in real time

I agree. I'd suggest to change the topic of this issue from Anomalous behavior in the algorithm for the selection of validators to be nominated (which is difficult to be assessed) to something related to the creation of a new feature: would be nice to have a mapping in the https://kusama.w3f.community/nominators endpoint between each nominated validator and the score that made it happen. wdyt ?

jimiflowers commented 3 months ago

You are right Alessio. It take his time to show data about current era (the era 6421 data was not present in the first session of the era). I have changed the tittle of the issue to a more accurate one.

It could be interesting to have info about the score of every nominated validator at the moment of election, this way the election process would be more clear. Why I ask for this? Cos I still see some validators with high score without nomination by 1KV and validators with "low score" (at least lower than others) that has been nominated in the same era, but I cannot figure when the nominations process was initiated (a couple of hours before the era starts??) so I cannot give "accurated" info.

My little job is been reflected here: https://flowerstake.io/1kv-stats/

It shows data from current era (if available) or from last era and data is fetched from source every 5 minutes

jimiflowers commented 3 months ago

Hi Alessio.

After seeing how the backend nomination mechanism works (or how it is supposed to work), I believe that the ideal, for transparency and clarity, would be to have an endpoint that shows the "snapshot" of the candidates used to make the nomination and which candidates have been selected. I think that would be enough to check if it is really always the highest scoring candidates that are selected, as right now it seems that there are quite a few low scoring candidates that are being nominated, when there are others with much higher scores that are not.

Of course, it is essential that the endpoint indicates the time (UTC) at which the process was carried out, otherwise it would not make much sense.

jimiflowers commented 3 months ago

Hi Alessio,

Please, review this also (Session 37,918): This node, nominated by 1KV Accounts (HgTtJusFEn2gmMmB5wmJDnMRXKD6dzqCpNR7a99kkQ7BNvX) and active right now, is being marked as "active: false" in /candidates endpoint:

{"slotId":134,"kyc":false,"discoveredAt":1611074931808,"nominatedAt":6413,"offlineSince":0,"offlineAccumulated":0,"rank":4200,"faults":0,"unclaimedEras":[],"inclusion":0.13095238095238096,"name":"Simply Staking 2","stash":"GLJLgrKhPDzSvNCNjQ184si3Fvu3bzSJBzewkzEZRVLV2oe","kusamaStash":"","commission":10,"identity":{"name":"Simply Staking","address":"DNDBcYD8zzqAoZEtgNzouVp2sVxsvqzD4UdB5WrAUwjqpL8","verified":true,"subIdentities":[{"name":"2","address":"GLJLgrKhPDzSvNCNjQ184si3Fvu3bzSJBzewkzEZRVLV2oe","_id":"6602b944cab3219482852922"}],"display":"Simply Staking","email":"staking@simply-vc.com.mt","judgements":["Reasonable"],"twitter":"@Simply_VC","_id":"6602e713cab3219482b4d4a9"},"active":false,"bonded":181.471106553627,"valid":true,"validity"...

The same case for a lot of nodes:

Node --> Nominator BLUEFIN_TUNA2 --> H4UgNEEN92YXz96AyQgwkJQSpXGdptYLkj9jXVKrNXjQHRJ STAKELY --> H4UgNEEN92YXz96AyQgwkJQSpXGdptYLkj9jXVKrNXjQHRJ TUTIFRUTINODE --> H54GA3nq3xeNrdbHkepAufSPMjaCxxkmfej4PosqD84bY3V And more

NOTE: Well, forget about this, the real problem is the /erastats endpoint, cos it's returning the era "6421" while the current era is "6422", so both endpoints (/ersstats and /candidates) are returning data from different eras. A real problem if we want to get the endpoints as the "source of truth".

jimiflowers commented 3 months ago

The score system has absolutely no sense. Here you are a graphic generated with data collected every 5 minutes from https://kusama.w3f.community. As you can see, score is varing every 5 minutes increasing or decreasing by more than 60 points for no apparent reason (there has been no change in nominations, no change of provider, region, country, etc.):

Captura de pantalla 2024-04-02 182227

However, for other candidates the score never varies, it is always the same, as in the case of the following two candidates, who have not changed their score in the last 2 days:

imagen

lobis commented 3 months ago

The score system has absolutely no sense. Here you are a graphic generated with data collected every 5 minutes from https://kusama.w3f.community. As you can see, score is varing every 5 minutes increasing or decreasing by more than 60 points for no apparent reason (there has been no change in nominations, no change of provider, region, country, etc.):

However, for other candidates the score never varies, it is always the same, as in the case of the following two candidates, who have not changed their score in the last 2 days:

Thanks for the graphs, I think this clearly shows there is a problem.

Have you studied the number of eras a validator is nominated (by the 1KV nominators) vs the score? Or regardless of the score. I think these distributions would be very insightful.

jimiflowers commented 3 months ago

The score system has absolutely no sense. Here you are a graphic generated with data collected every 5 minutes from https://kusama.w3f.community. As you can see, score is varing every 5 minutes increasing or decreasing by more than 60 points for no apparent reason (there has been no change in nominations, no change of provider, region, country, etc.):

However, for other candidates the score never varies, it is always the same, as in the case of the following two candidates, who have not changed their score in the last 2 days:

Thanks for the graphs, I think this clearly shows there is a problem.

Have you studied the number of eras a validator is nominated (by the 1KV nominators) vs the score? Or regardless of the score. I think these distributions would be very insightful.

Now I will try to put score and nomination together in the same graphs to show the score of nominated candidates before, during and after the nomination. I will share the info when available.

ajk-code commented 3 months ago

@jimiflowers :Not sure what score metric you use, note that the two score flatliners (ABGAR, ACTIVATOR) are actually invalid (valid=false). Here an extract from the same datasource (github candidates) using the score.total metric, intervals 4h, valid=true:

I think the scores are fairly dynamic and not limited to provider/location only.

jimiflowers commented 3 months ago

@ajk-code: if they turned not valid, maybe that's the reason why they have a flat liners graphs, cos my script discards not valid candidates (they are not relevant for the fact I expose).

Of Course, score is not limited to provider/location, I only mentioned location/provider/nominators in my comment cos these parameters had not changed during the data collection shown in the attached graphs (score.total property fetched every 5 minutes from https://kusama.w3f.community/candidates and only from valid ones).

I have not even mentioned at any time that the score system is unfair, I simply say that it is weird and I'm trying to understand how and why the score awarded varies over time (and I think I am not the only one who has mentioned it) and that it would be advisable to have additional information on the election of nodes, such as the specific score of each node at the exact time of the election to be nominated, since it would give a better idea of how this process is carried out, since it is done in moments that are unknown to the participants and where the only thing we can see is that there are candidates with less score than others and that they are selected, which is not logical if the nomination is based on score.

If everybody thinks all is fine and there is no change to do, well, they can close the issue. But yesterday Michellis mentioned on the matrix group that there is a known issue with score system and is being checked, so there might be something, don't you think?

ajk-code commented 3 months ago

@jimiflowers Could be that there are bugs, but its not as easy to understand as it seems at first glance imho. The mathcrypto page is probably outdated (e.g. opengov scores) but may give some insights. The links to the code itself as single-source-of-truth are on the page as well.

lobis commented 3 months ago

@jimiflowers Could be that there are bugs, but its not as easy to understand as it seems at first glance imho. The mathcrypto page is probably outdated (e.g. opengov scores) but may give some insights. The links to the code itself as single-source-of-truth are on the page as well.

There definitely are bugs, such as https://github.com/w3f/1k-validators-be/issues/2816.

jimiflowers commented 3 months ago

@ajk-code: I have read the bases of how the scoring is assigned and I understand perfectly how it is supposed to work. I repeat that I don't see that (how the score system is supposed to work) as a problem, but if you are clear on how it all works and why some candidates are nominated and not others, please explain this to me so that I can understand it:

Candidate "EK-Kusama1" (DZzMSwXzbxhnCJePpzRKs1GD3yX25LP91y2Q9kFmPHXQ1vY) not nominated since era 6421, current score 804 points. Candidate "Dot Plus/1" (G9fk8vjk2eiy65mQjog8MgPuLAHjjnagv4ixVzY7zRiEu3Z) currently nominated by G1Y1bvviE3VpDTm2dERe5xGiU2izNcJwYNHx95RJhqoWqqm with 509 points.

Here you are the time window in which "Dot PLus/1" becames nominated (green shading)

NOTE: Time is GMT+2

This is the kind of backend behavior that I don't understand and would like to understand. And I'm not saying it's right or wrong, I'm just saying I don't understand it.

w3f / 1k-validators-be

Add info about validators being nominated by 1KV accounts #2807