Make it possible to analyze data related to validator activity

MonsieurNicolas commented 1 month ago

What problem does your feature solve?

One of the foundations of SCP is that entities can adjust who they trust as validators on the network.

In a system build on trust, there needs to be mechanisms to verify/audit that entities are behaving the way they claim to be.

What would you like to see?

The data set relevant to this issue is:

which validator "won" nomination (validators sign the transaction set they introduce)
the mapping at the time concensus took place between a validator key and the entity it represents (stellar.toml)
the exact structure of the generalized transaction set that the winning validator introduced

Some of the reports that become possible:

audit that there is no bias for or against specific entities, and if some bias exists to understand it better so that validators can take corrective actions
- some entities may be biases against because of performance issues for example
- some entities may be biases towards because of some bug in the stack somewhere
make more transparent situations where validators are traffic shaping in some way
- penalizing single transactions (create a "surge pricing group of one transaction") - as identified in https://github.com/stellar/stellar-protocol/blob/master/core/cap-0042.md#security-concerns
- giving priority to certain tokens/trades (MEV exploitation like behavior)

What alternatives are there?

None, other than implement some other data lake that would contain this data.

MonsieurNicolas commented 1 month ago

@anupsdf FYI

anupsdf commented 1 month ago

the exact structure of the generalized transaction set that the winning validator introduced

Also, how about adding some metrics around the total transactions, min fee, max fee among each lane (tx set) from that winning validators's mempool? This will help analyze the separate lanes better and have insight into any biases or bugs when different tx sets are proposed.

MonsieurNicolas commented 1 month ago

@anupsdf the mempool structure is out of scope of this. I'd like to keep this to what can be verified based on what goes through consensus.

That being said, we should be able to have reports per tier 1 organization of the how fees are distributed and if lanes are being used.

sydneynotthecity commented 3 weeks ago

the mapping at the time concensus took place between a validator key and the entity it represents (stellar.toml)

@MonsieurNicolas or @anupsdf is this information written to the ledger close meta during time of consensus? Or would this require us to pull the stellar.toml at runtime to pull the information?

(cc @chowbao)

ire-and-curses commented 3 weeks ago

the mapping at the time concensus took place between a validator key and the entity it represents (stellar.toml) There doesn't seem to be any straightforward and efficient way today to construct this mapping.

Stellar tomls live on websites, and describe validators. These websites aren't particularly discoverable, and in particular are not discoverable from on-chain data.
SCP messages provide validator public keys but there's no guarantee that a given website publishing a toml claiming to control that key actually does so.

It seems to me that a pre-requisite for this work would be hardening SEP 1. If standard on-chain data entries published the hash of a toml and also the toml location off-chain, and if the toml published the validator information and the account, then

This is self-consistent and doesn't need to be checked on every scp message - it is enough to check for data entry change deltas
A scanning of on-chain data would allow a complete list of tomls, and therefore a complete list of validator entities to be easily constructed.
As an additional benefit, all auxiliary data in the toml would be verifiable, and changes to the toml would require changes to the corresponding data entry, thus forming a traceable and transparent record. Contrast this with today's situation, where a toml file can change dynamically at any time, with no visible revision history.

sydneynotthecity commented 1 week ago

The data set relevant to this issue is:

the exact structure of the generalized transaction set that the winning validator introduced

@MonsieurNicolas when you say exact structure of the generalized transaction set, do you want an aggregated view of the transaction set at the ledger level? For example: classic phase: 2 component(s): [{discounted txs:24, ops:152, base_fee:100}, {discounted txs:429, ops:500, base_fee:51234}], soroban phase: 1 component(s): [{discounted txs:1, ops:1, base_fee:100}]

We save the fee data at the transactional level, so we have the ability to derive the base fee + count of transactions for each base fee. But for ease of use, we can aggregate that data at the ledger level and append to the table, as well.

stellar / stellar-etl