There is a race condition occurring between the tip requesting, and the broadcast and reply stages. There is a write flag being switched on and off by the various threads that stop the sendQueue from being polled from and sent to neighbours. As a result, we see a large spike in dropped transaction requests, and the node will stop synchronising. Once a node runs into this error the only remedy is to drop and reconnect the neighbour, and even then the problem could resurface any time there's a large amount of transactions being processed simultaneously.
To fix this we can remove the back and forth switching of this write flag.
Type of change
Bug fix (a non-breaking change which fixes an issue)
How the change has been tested
Nodes no longer froze when synchronising with neighbours.
Change checklist
[x] My code follows the contribution guidelines for this project
[x] I have performed a self-review of my own code
[x] New and existing unit tests pass locally with my changes
Description of change
There is a race condition occurring between the tip requesting, and the broadcast and reply stages. There is a write flag being switched on and off by the various threads that stop the
sendQueue
from being polled from and sent to neighbours. As a result, we see a large spike in dropped transaction requests, and the node will stop synchronising. Once a node runs into this error the only remedy is to drop and reconnect the neighbour, and even then the problem could resurface any time there's a large amount of transactions being processed simultaneously.To fix this we can remove the back and forth switching of this write flag.
Type of change
How the change has been tested
Nodes no longer froze when synchronising with neighbours.
Change checklist