solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.4k stars 4.38k forks source link

Repair request can potentially be throttled #7473

Open pgarg66 opened 4 years ago

pgarg66 commented 4 years ago

Problem

The node requesting repairs does not consider how many codes it has already received for an FEC block. It checks for missing data indexes, and requests repairs for all of them. However, some of those missing data shreds could potentially be recovered via Erasure, if a small subset of shreds were to be repaired. The current repair logic puts extra burden on network traffic, and peer nodes. More intelligence could be added here to remove this burden. Alternatively, same number of repair requests/response could be used to fill up many more shred gaps.

Proposed Solution

  1. Detect the missing data shred indexes
  2. Identify FEC set that the data would belong to
  3. Detect how many coding shreds are received for the FEC set
  4. Calculate how many more data shreds are needed to recover the FEC block
  5. Request repair for a subset of missing data indexes, such that missing index >= request >= needed shreds for recovery

5th step can take a more conservative approach to request repair for slightly higher number of shreds.

Tag: @sagar-solana @carllin @aeyakovenko

pgarg66 commented 4 years ago

@carllin , @sagar-solana would making this change complicate current repair related issues that we are seeing in TdS/SLP clusters?