polkadot-fellows / RFCs

Proposals for change to standards administered by the Fellowship.
https://polkadot-fellows.github.io/RFCs/
Creative Commons Zero v1.0 Universal
115 stars 55 forks source link

Assignment of availability-chunk indices to validators #47

Closed alindima closed 8 months ago

alindima commented 10 months ago

Rendered

This RFC proposes a way of permuting the availability chunk indices assigned to validators for a given core and relay chain block, in the context of recovering available data from systematic chunks.

alindima commented 10 months ago

CC: @sandreim, @burdges, @ordian, @eskimor, @alexggh

alindima commented 10 months ago

Considering feedback and latest discoveries, I'll rewrite the proposal using ParaId instead of CoreIndex and using relay_parent_hash instead of (session_index, block_number)

alindima commented 10 months ago

Rewritten the proposal. Have a look

alindima commented 10 months ago

Thanks for the reviews and suggestions Andronik! Curious to see what others think as well!

alindima commented 10 months ago

Thanks to everybody for the feedback! Based on this and especially my discussions with @eskimor , I will modify the RFC so that:

  1. The shuffling function does not contain any randomness (or other complex third party libraries). This mitigates the problem of the rand crate changing implementations and enables us to not embed the algorithm in the runtime. We want an implementation that can be easily coded and has minimal points of failure in terms of updates/supply chain issues. It'll also make it easier for alternate clients to implement the same algorithm.
  2. The shuffling function uses core_index instead of para_id. This enables us to easily implement an even distribution, because core indices form a monotonically increasing sequence. At the same time, core_index cannot be chosen by a parachain to influence the chunk assignment.
  3. The ChunkFetchingRequest will not ask for a specific chunk index. Validators will respond with the chunk they should be holding according to the assignment function. This enables availability-recovery in scenarios where we don't know the mapping (for example for collators or disputes happening on unimported forks).
  4. Disputes protocol will not always be able to perform systematic recovery. Since not all disputes have easy access to the core_index occupied by a candidate, this would be very complicated to implement. We'll consider this acceptable, as it's a very rare circumstance. Regular systematic recovery will work as before.

Note that we'll keep asserting that enabling this feature needs to happen atomically between validators and collators.

One thing that I've not decided on is the exact assignment function implementation. I think it could be (inspired by other comments):

pub fn get_chunk_index(
  n_validators: u32,
  validator_index: ValidatorIndex,
  block_number: BlockNumber,
  core_index: CoreIndex
) -> ChunkIndex {
  let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
  let core_start_pos = abs(core_index - block_number) * threshold;

  (core_start_pos + validator_index) % n_validators
}
ordian commented 10 months ago

The ChunkFetchingRequest will not ask for a specific chunk index. Validators will respond with the chunk they should be holding according to the assignment function. This enables availability-recovery in scenarios where we don't know the mapping (for example for collators or disputes happening on unimported forks).

Do you mean introducing v3 version of the req-resp protocol? And how would that work for availability when we need to get our chunk from backers?

As a result, the implementation of this RFC will no longer be a breaking change for collators.

If the answer to the previous question is yes, I don't see how this is not a breaking change. If no, the receiving side is validating the chunk index matches the response, so I don't see how it's not a breaking change.

I think it would be reasonable to introduce req-resp v3 and later after some time enable this assignment via the runtime feature flag.

alindima commented 10 months ago

Do you mean introducing v3 version of the req-resp protocol?

No, I didn't want to add a new version to the protocol. The idea was that we assume all validators have upgraded to have the same validator->index mapping and we also assume that two thirds of them are honest and responded with the right chunk they should be holding according to the mapping (but we don't check that they responded with the right chunk, because we may not know it beforehand).

And how would that work for availability when we need to get our chunk from backers?

That's a good point. Same as before. The validator would issue the request, and in this specific case also check the index.

The idea is that validators will usually check the indices they get (during availability-distribution or systematic recovery during approval checking), but the regular recovery will also work when the entity (collator or validator participating in a dispute) does not know/check the indices against the canonical mapping.

If no, the receiving side is validating the chunk index matches the response, so I don't see how it's not a breaking change.

you're right, it's a breaking change. I guess what I meant is that, after the upgrade, regular recovery will still work even if the collator does not have access to the canonical mapping.

I'll reword my comment.

I think it would be reasonable to introduce req-resp v3 and later after some time enable this assignment via the runtime feature flag.

Experimenting a bit more with what I proposed above, I realised that we either bump the protocol or we do some other arguably hacky way of discovering the chunk index on the receveing side (iterating through keys after reconstructing the partial trie from the proof and checking which value matches the chunk hash).

Considering that you called out the fact that even not bumping the protocol version would still be a breaking change to collators, I think it's best to bump the protocol version and add the chunk_index to the response. This new protocol version would also have the semantic meaning that the requester MAY not always check the chunk index against the canonical mapping

eskimor commented 10 months ago

At the same time, core_index cannot be chosen by a parachain to influence the chunk assignment.

This is actually not true with agile core time: A parachain bids specifically on one particular core.

But it also should not matter much: We are rotating the mapping every block. On top of that, there is only so much a parachain can do here anyway.

The ChunkFetchingRequest will not ask for a specific chunk index. Validators will respond with the chunk they should be holding according to the assignment function. This enables availability-recovery in scenarios where we don't know the mapping (for example for collators or disputes happening on unimported forks).

This is not what we discussed. I suggested to keep requesting via validator index. Right now we also store in the av-store per validator index - if we don't change that, then a node can provide a chunk as requested by validator index, even if it does not know the mapping itself*). Hence everything would keep working with neither the requester nor the responder still knowing the mapping.

*) Assuming the chunk data itself gets stored together with the chunk index.

Once we have the core index in the receipt, this would strictly speaking no longer be necessary, but I also don't see much harm and if we really wanted to, we could change the network protocol again once we have this. Given that changing the candidate receipt is already a pita ... would not be that much of an additional annoyance.

burdges commented 10 months ago

At the same time, core_index cannot be chosen by a parachain to influence the chunk assignment.

This is actually not true with agile core time: A parachain bids specifically on one particular core.

But it also should not matter much: We are rotating the mapping every block. On top of that, there is only so much a parachain can do here anyway.

It's likely fine. We want the systemic chunks spread evenly enough to minimize other protocol manipulation. It's possible some parachains chose their cores to make frierndly validators more profitable, which overworks some other validators, and causes reconstructions. It's probably fine though.

alindima commented 10 months ago

This is actually not true with agile core time: A parachain bids specifically on one particular core. But it also should not matter much: We are rotating the mapping every block. On top of that, there is only so much a parachain can do here anyway.

Noted 👍🏻

This is not what we discussed. I suggested to keep requesting via validator index. Right now we also store in the av-store per validator index - if we don't change that, then a node can provide a chunk as requested by validator index, even if it does not know the mapping itself*). Hence everything would keep working with neither the requester nor the responder still knowing the mapping.

Yes, that's what I meant - requesting by validator index just as before. About storing in the DB by validator index - that would still require that one of the actors (requester/responder) needs to know the chunk index because the reconstruction algorithm needs the indices. As discussed, the requester may not know this, so that's why I suggested we return the ChunkIndex in the ChunkFetchingResponse. Whether we store in the DB the chunk index or the validator index I don't think is very relevant, but using the chunk index would be easier to use IMO

alindima commented 9 months ago

I think that the RFC is in a form that can be proposed and voted on. Please give another review and approve it if you think so.

eskimor commented 8 months ago

/rfc propose

paritytech-rfc-bot[bot] commented 8 months ago

Hey @eskimor, here is a link you can use to create the referendum aiming to approve this RFC number 0047.

Instructions 1. Open the [link](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-collectives-rpc.polkadot.io#/extrinsics/decode/0x3d003e02015901000049015246435f415050524f564528303034372c63333662316233363963616465633465616134363637336439383030373633383164393639656432323834346661383731616563393634616433303261663539290100000000). 2. Switch to the `Submission` tab. 3. Adjust the transaction if needed (for example, the proposal Origin). 4. Submit the Transaction

It is based on commit hash 4ae75296bfdeb1b2ca1e9d20f78c1a783475de47.

The proposed remark text is: RFC_APPROVE(0047,c36b1b369cadec4eaa46673d980076381d969ed22844fa871aec964ad302af59).

alindima commented 8 months ago

/rfc process

paritytech-rfc-bot[bot] commented 8 months ago

Please provider a block hash where the referendum confirmation event is to be found. For example:

/rfc process 0x39fbc57d047c71f553aa42824599a7686aea5c9aab4111f6b836d35d3d058162
Instructions to find the block hashHere is one way to find the corresponding block hash. 1. Open the referendum on Subsquare. 2. Switch to the `Timeline` tab. --- 3. Go to the details of the `Confirmed` event. --- 2. Go to the details of the block containing that event. --- 2. Here you can find the block hash.
alindima commented 8 months ago

/rfc process 0xe750513eb583410686c094e8e5f00c0b18dc33e0296c1c8572c2e83f021c03a5

paritytech-rfc-bot[bot] commented 8 months ago

The on-chain referendum has approved the RFC.