The node requesting repairs does not consider how many codes it has already received for an FEC block. It checks for missing data indexes, and requests repairs for all of them. However, some of those missing data shreds could potentially be recovered via Erasure, if a small subset of shreds were to be repaired. The current repair logic puts extra burden on network traffic, and peer nodes. More intelligence could be added here to remove this burden. Alternatively, same number of repair requests/response could be used to fill up many more shred gaps.
Proposed Solution
Detect the missing data shred indexes
Identify FEC set that the data would belong to
Detect how many coding shreds are received for the FEC set
Calculate how many more data shreds are needed to recover the FEC block
Request repair for a subset of missing data indexes, such that missing index >= request >= needed shreds for recovery
5th step can take a more conservative approach to request repair for slightly higher number of shreds.
Problem
The node requesting repairs does not consider how many codes it has already received for an FEC block. It checks for missing data indexes, and requests repairs for all of them. However, some of those missing data shreds could potentially be recovered via Erasure, if a small subset of shreds were to be repaired. The current repair logic puts extra burden on network traffic, and peer nodes. More intelligence could be added here to remove this burden. Alternatively, same number of repair requests/response could be used to fill up many more shred gaps.
Proposed Solution
5th step can take a more conservative approach to request repair for slightly higher number of shreds.
Tag: @sagar-solana @carllin @aeyakovenko