Open behzadnouri opened 2 years ago
I'm observing on the order of 100 shred repair requests being issued per second, but the vast majority of these end up not getting inserted into the blockstore because they already exist (propagated through turbine or recovery eventually). Only ~1 repair per second actually gets inserted into blockstore.
As an experiment, I increased the MAX_TURBINE_PROPAGATION_IN_MS
constant, which controls how long we'll wait after the "expected" shred arrival time before issuing a repair request. Increasing this time had a large impact on the total number of repairs issued as well as the number of necessary repairs as shown by table below:
Note that @steviez collected a similar set of data in #28634 that shows repaired shreds colliding with turbine/recovery
I believe this answers a large part of the question posed by this issue. What is still not clear to me is if the remaining successful repairs are absolutely necessary (or turbine/recovery would have got those shreds into blockstore eventually) and if the number makes sense given expected packet loss.
A few more interesting observations:
HighestShred
repair type through the generate_repairs_for_slot
function. These repairs seem unnecessary as we haven't waited very long at all for turbine shreds to show up.Here's an interesting log snippet detailing the evolution of slot 168239177, FEC 96, which ultimately had 1 successful repair to shred 109:
[2022-12-21T20:12:10.228677416Z WARN solana_ledger::blockstore] BWLOG: StillNeed 31 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.242342641Z WARN solana_ledger::blockstore] BWLOG: StillNeed 30 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.257640999Z WARN solana_ledger::blockstore] BWLOG: StillNeed 29 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.284582242Z WARN solana_ledger::blockstore] BWLOG: StillNeed 28 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.289387464Z WARN solana_ledger::blockstore] BWLOG: StillNeed 27 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.294383968Z WARN solana_ledger::blockstore] BWLOG: StillNeed 25 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.295578511Z WARN solana_ledger::blockstore] BWLOG: StillNeed 24 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.299210581Z WARN solana_ledger::blockstore] BWLOG: StillNeed 23 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.300530091Z WARN solana_ledger::blockstore] BWLOG: StillNeed 22 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.302735578Z WARN solana_ledger::blockstore] BWLOG: StillNeed 21 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.308923790Z WARN solana_ledger::blockstore] BWLOG: StillNeed 20 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.312854886Z WARN solana_ledger::blockstore] BWLOG: StillNeed 19 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.313418614Z WARN solana_ledger::blockstore] BWLOG: StillNeed 18 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.321557781Z WARN solana_ledger::blockstore] BWLOG: StillNeed 17 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.322756622Z WARN solana_ledger::blockstore] BWLOG: StillNeed 16 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.322915874Z WARN solana_ledger::blockstore] BWLOG: StillNeed 15 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.325492725Z WARN solana_ledger::blockstore] BWLOG: StillNeed 14 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.335898856Z WARN solana_ledger::blockstore] BWLOG: StillNeed 13 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.361718104Z WARN solana_ledger::blockstore] BWLOG: StillNeed 12 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.363473669Z WARN solana_ledger::blockstore] BWLOG: StillNeed 11 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.448763466Z WARN solana_ledger::blockstore] BWLOG: StillNeed 10 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.500319796Z WARN solana_core::repair_service] BWLOG: generate_repairs_for_slot requesting 21 repairs: [96, 97, 98, 100, 102, 103, 104, 105, 109, 112, 113, 114, 116, 117, 118, 121, 122, 123, 125, 126, 127] from slot 168239177. 328 ms since first shred, consumed = 96, received = 479, completed = {0, 1, 2, 7, 8, 9, 26, 31, 34, 37, 39, 41, 42, 44, 46, 50, 51, 53, 54, 57, 59, 61, 64, 66, 70, 86, 88, 89, 90, 94, 95, 106, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 142, 143, 145, 146, 148, 149, 169, 171, 172, 174, 175, 176, 178, 180, 181, 183, 184, 186, 187, 189, 190, 191, 192, 193, 194, 196, 198, 199, 200, 201, 202, 203, 222, 224, 225, 226, 228, 230, 231, 232, 234, 235, 236, 238, 241, 244, 264, 265, 267, 269, 270, 271, 272, 273, 274, 276, 277, 278, 299, 300, 301, 302, 304, 305, 307, 310, 311, 312, 314, 315, 316, 317, 318, 319, 320, 321, 323, 325, 326, 327, 347, 349, 350, 352, 353, 355, 356, 358, 359, 360, 362, 363, 365, 367, 368, 370, 371, 372, 373, 393, 394, 395, 396, 398, 399, 400, 401, 402, 404, 405, 407, 436, 464, 478}
[2022-12-21T20:12:10.553677206Z WARN solana_ledger::blockstore] BWLOG: successful repair insert for slot 168239177, shred 109
[2022-12-21T20:12:10.553682957Z WARN solana_ledger::blockstore] BWLOG: StillNeed 9 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.586413656Z WARN solana_ledger::blockstore] BWLOG: StillNeed 8 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.595120939Z WARN solana_ledger::blockstore] BWLOG: StillNeed 7 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.596805671Z WARN solana_ledger::blockstore] BWLOG: StillNeed 6 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.604588493Z WARN solana_core::repair_service] BWLOG: generate_repairs_for_slot requesting 20 repairs: [96, 97, 98, 100, 102, 103, 104, 105, 112, 113, 114, 116, 117, 118, 121, 122, 123, 125, 126, 127] from slot 168239177. 432 ms since first shred, consumed = 96, received = 514, completed = {0, 1, 2, 7, 8, 9, 26, 31, 34, 37, 39, 41, 42, 44, 46, 50, 51, 53, 54, 57, 59, 61, 64, 66, 70, 86, 88, 89, 90, 94, 95, 106, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 142, 143, 145, 146, 148, 149, 169, 171, 172, 174, 175, 176, 178, 180, 181, 183, 184, 186, 187, 189, 190, 191, 192, 193, 194, 196, 198, 199, 200, 201, 202, 203, 222, 224, 225, 226, 228, 230, 231, 232, 234, 235, 236, 238, 241, 244, 264, 265, 267, 269, 270, 271, 272, 273, 274, 276, 277, 278, 299, 300, 301, 302, 304, 305, 307, 310, 311, 312, 314, 315, 316, 317, 318, 319, 320, 321, 323, 325, 326, 327, 347, 349, 350, 352, 353, 355, 356, 358, 359, 360, 362, 363, 365, 367, 368, 370, 371, 372, 373, 393, 394, 395, 396, 398, 399, 400, 401, 402, 404, 405, 407, 428, 430, 432, 433, 434, 436, 437, 438, 439, 441, 442, 443, 444, 445, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 480, 481, 483, 484, 486, 507, 508, 509, 512, 513}
[2022-12-21T20:12:10.610407315Z WARN solana_ledger::blockstore] BWLOG: StillNeed 5 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.622011796Z WARN solana_ledger::blockstore] BWLOG: StillNeed 4 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.645742447Z WARN solana_ledger::blockstore] BWLOG: StillNeed 3 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.650798444Z WARN solana_ledger::blockstore] BWLOG: StillNeed 2 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.651050301Z WARN solana_ledger::blockstore] BWLOG: StillNeed 1 ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.652664910Z WARN solana_ledger::blockstore] BWLOG: CanRecover ErasureSetId(168239177, 96) Index { slot: 168239177, data: ShredIndex { index: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 99, 101, 106, 107, 108, 109, 110, 111, 115, 116, 119, 120, 121, 124, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513} }, coding: ShredIndex { index: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 67, 68, 69, 71, 72, 73, 74, 75, 82, 84, 87, 88, 91, 95, 96, 97, 98, 99, 100, 101, 103, 104, 106, 110, 113, 114, 115, 116, 119, 122, 124, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 158, 159, 160, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 276, 277, 278, 279, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 403, 404, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 447, 448, 449, 450, 451, 452, 453, 454, 456, 457, 458, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 482, 487, 496, 509, 513, 514, 515, 516, 517, 519, 520, 521, 522, 524, 525, 527, 528, 529, 530, 531, 534, 535, 536, 537, 538, 540, 541, 542, 543, 545, 546, 547, 548, 551, 552, 553, 554, 555, 556, 557, 558, 559, 562, 564, 565, 566, 567, 569, 571, 572, 573} } }
[2022-12-21T20:12:10.654743738Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.656298934Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.656995814Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.662231711Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.664292144Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.665374124Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.667918614Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.672956958Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.682359378Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.700957196Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.739988039Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.740975049Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.842044797Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
[2022-12-21T20:12:10.895988728Z WARN solana_ledger::blockstore] BWLOG: DataFull ErasureSetId(168239177, 96)
2 actions that can be taken to cut down on repair related network traffic:
MAX_TURBINE_PROPAGATION_IN_MS
so that we wait for longer for shreds to show up through turbine before requesting repair. I was seeing large reduction in requests without any visible impact to slot full times when increasing from 100 to 200generate_repairs_for_slot
when the shred has surpassed the turbine timeoutAre the stats gathered in https://github.com/solana-labs/solana/issues/28638#issuecomment-1362099338 with current 100ms MAX_TURBINE_PROPAGATION_IN_MS
or some other setup?
packet success within a shred has some amount of correlation for whatever reason.
within a "batch"? not "shred", right?
Increase MAX_TURBINE_PROPAGATION_IN_MS so that we wait for longer for shreds
any idea how far we can push this before it starts negatively impacting node? also, wondering if we can design a more robust mechanism than just an arbitrary wait time.
Are the stats gathered in #28638 (comment) with current 100ms
MAX_TURBINE_PROPAGATION_IN_MS
or some other setup?
They are gathered with MAX_TURBINE_PROPAGATION_IN_MS
set to 200. However, I also have stats gathered with the default 100, and the output looks almost the same. The only real visible change I can pick up on is fewer repair requests being made, leading to fewer blockstore insertion failures for "already exists."
within a "batch"? not "shred", right?
Good catch. I think I meant to say "erasure set" here.
any idea how far we can push this before it starts negatively impacting node? also, wondering if we can design a more robust mechanism than just an arbitrary wait time.
Not completely sure. 200 seems to cut down the noise by an order of magnitude without impacting slot times. I previously saw further repair request reduction moving to 400, but I didn't capture block insertion times for that run.
I actually don't hate the current wait mechanism. It uses a reference time for each shred as an offset from the first shred received time + some arbitrary turbine delay time.
This is another idea we can try.
already present shred count / repaired shred count
goes above a certain threshold, slow down on repair requests.MAX_TURBINE_PROPAGATION_IN_MS
with a dynamically calculated value in the repair algorithm. Or, by probably a cleaner refactor of the code.The idea is that each node has a somewhat unique network connectivity (latency/throughput/jitter). And, this dynamic adjustment will allow it to monitor and tune the repairs.
Thoughts? @behzadnouri, @bw-solana
This is another idea we can try.
On a given node (repairer), monitor how many repaired shreds are already in the blockstore by the time repaired shred is received. We are already tracking it.
If the ratio of
already present shred count / repaired shred count
goes above a certain threshold, slow down on repair requests.If the ratio goes down a certain threshold, make repair requests more aggressive.
This can be done by replacing
MAX_TURBINE_PROPAGATION_IN_MS
with a dynamically calculated value in the repair algorithm. Or, by probably a cleaner refactor of the code.The idea is that each node has a somewhat unique network connectivity (latency/throughput/jitter). And, this dynamic adjustment will allow it to monitor and tune the repairs.
Thoughts? @behzadnouri, @bw-solana
Makes sense to me. We should try it out on a few nodes.
What's a good target redundant repair shred rate? 50%?
I can hack up something and try on monogon cluster first.
I am not against trying this out and see if it results in an improvement or any interesting insights.
However please keep in mind that these kinds of dynamic adjustments tend to become very difficult to predict their behavior or debug when things break, specially when there are a lot of other dynamic components or moving parts that they interact with.
I am not against trying this out and see if it results in an improvement or any interesting insights.
However please keep in mind that these kinds of dynamic adjustments tend to become very difficult to predict their behavior or debug when things break, specially when there are a lot of other dynamic components or moving parts that they interact with.
Noted. Would keeping dynamism in check with upper/lower bounds be a way forward?
Would keeping dynamism in check with upper/lower bounds be a way forward?
sure, lets run some experiments and evaluate. thank you
I added some counters to see how many duplicate shreds are because of repair. Running on monogon, with 100 nodes, 1 client, and --bench-tps-args 'tx-count=15000 threads=4 thread-batch-sleep-ms=0'
produces this result.
This counter is in core/src/window_service.rs
, in PossibleDuplicateShred::Exists(shred) =>
with some tweaks to pass along the repaired
status.
I noticed some deduping in sigverify_stage
as well. But in my experiments, none of those are due to repairs. But, that deduping is happening within a smaller time window. So it may not catch anything relevant wrt repairs.
Any thoughts about the analysis?
I added some counters to see how many duplicate shreds are because of repair. Running on monogon, with 100 nodes, 1 client, and
--bench-tps-args 'tx-count=15000 threads=4 thread-batch-sleep-ms=0'
produces this result.This counter is in
core/src/window_service.rs
, inPossibleDuplicateShred::Exists(shred) =>
with some tweaks to pass along therepaired
status.I noticed some deduping in
sigverify_stage
as well. But in my experiments, none of those are due to repairs. But, that deduping is happening within a smaller time window. So it may not catch anything relevant wrt repairs.Any thoughts about the analysis?
Is this essentially the same as blockstore-insert-shreds.num_repaired_data_shreds_exists
?
Looks like spikes around 250 shreds. Is this a summation, average? What's the grouping interval?
I think it would be helpful if the plot shows the total number of repair shreds as well. Like if the plot shows 200, out of how many incoming repair shreds 200 of them were redundant.
I noticed some deduping in
sigverify_stage
as well. But in my experiments, none of those are due to repairs. But, that deduping is happening within a smaller time window. So it may not catch anything relevant wrt repairs.
Because repair shreds have a trailing unique nonce, I believe the deduper in sigverify-stage does not filter them out (aside from the false positive rate of the deduper).
I added some counters to see how many duplicate shreds are because of repair. Running on monogon, with 100 nodes, 1 client, and
--bench-tps-args 'tx-count=15000 threads=4 thread-batch-sleep-ms=0'
produces this result.This counter is in
core/src/window_service.rs
, inPossibleDuplicateShred::Exists(shred) =>
with some tweaks to pass along therepaired
status. I noticed some deduping insigverify_stage
as well. But in my experiments, none of those are due to repairs. But, that deduping is happening within a smaller time window. So it may not catch anything relevant wrt repairs. Any thoughts about the analysis?Is this essentially the same as
blockstore-insert-shreds.num_repaired_data_shreds_exists
?Looks like spikes around 250 shreds. Is this a summation, average? What's the grouping interval?
Thanks for pointing out. Yes it's essentially the same as blockstore-insert-shreds.num_repaired_data_shreds_exists
. Great that we already have a counter for it.
The graph is just printing values as is, with no formula. Counter's metricrate is 1.
I think it would be helpful if the plot shows the total number of repair shreds as well. Like if the plot shows 200, out of how many incoming repair shreds 200 of them were redundant.
I noticed some deduping in
sigverify_stage
as well. But in my experiments, none of those are due to repairs. But, that deduping is happening within a smaller time window. So it may not catch anything relevant wrt repairs.Because repair shreds have a trailing unique nonce, I believe the deduper in sigverify-stage does not filter them out (aside from the false positive rate of the deduper).
I'll take a look. Thanks @behzadnouri
The total repaired shred received by the fetch stage
The shreds which already existed in the blockstore
And, if apply mean
as the formula.
Total repaired shreds
Duplicates
So some good percentage of shreds are actually duplicate. The percentage varies over time.
So some good percentage of shreds are actually duplicate. The percentage varies over time.
This seems inline with results we've seen in the past.
One thing I wonder for the duplicate repaired shreds is how many fall into each of these cases:
(2) is the one that really has me thinking. I'm wondering if we should limit outstanding repair requests to what we need to kick things over to recovery.
(2) is a fair point, as erasure recovery can fill in the missing shreds (for which the repairs were already requested). Not sure if there's any clever way to detect it though. I don't think we store any recovery related information in blockstore today.
I am guessing this is still way underestimating number of redundant repair requests; or something else is going on that we don't yet understand.
According to this binomial distribution calculator: https://stattrek.com/online-calculator/binomial With 32:32 batches, even if 25% of shreds are not delivered (which is way too large for a monogon cluster), the probability of recovering the batch is still effectively 100%. So I would expect non-redundant repairs should almost be zero.
I'll do some more digging.
A few points to note
How is the topology of the monogon cluster? Are nodes geographically distributed?
Problem
Sherds are propagated using 32:32 data:code erasure batches. Despite that even at times that the delinquent stake is pretty low and the cluster is working fine, we still observe very small yet non-zero repair shreds on the cluster. Similarly on a small GCE cluster, repair metrics are yet non-zero.
One possibility is that repair is too aggressive and some repaired shreds are ultimately received from Turbine anyways or recovered through erasure codes.
Proposed Solution
Debug why nodes still repair shreds even when delinquent stake is low and cluster is not overloaded.