Repair request can potentially be throttled

Problem

The node requesting repairs does not consider how many codes it has already received for an FEC block. It checks for missing data indexes, and requests repairs for all of them. However, some of those missing data shreds could potentially be recovered via Erasure, if a small subset of shreds were to be repaired. The current repair logic puts extra burden on network traffic, and peer nodes. More intelligence could be added here to remove this burden. Alternatively, same number of repair requests/response could be used to fill up many more shred gaps.

Proposed Solution

Detect the missing data shred indexes
Identify FEC set that the data would belong to
Detect how many coding shreds are received for the FEC set
Calculate how many more data shreds are needed to recover the FEC block
Request repair for a subset of missing data indexes, such that missing index >= request >= needed shreds for recovery

5th step can take a more conservative approach to request repair for slightly higher number of shreds.

Tag: @sagar-solana @carllin @aeyakovenko

solana-labs / solana

Repair request can potentially be throttled #7473

Problem

Proposed Solution