PoH service generates a certain number of hashes (configured via Genesis block, as hashes_per_tick) before registering a tick. The service thread sets the CPU affinity, but due to OS scheduling, there is no guarantee that no other thread will get scheduled on the CPU core. This causes different node types (leader/validator) to take different amount of time to compute the same number of hashes. This causes network drift as nodes complete their respective slots at different times. End result can be leader timeouts, and dropped slots.
Proposed Solution
I tried setting the scheduling policy for PoH service thread to realtime, with FIFO policy. This helps align the PoH time for all nodes.
Running a 5 node network, I plotted the time it took to do 8 ticks on the cluster.
The following graphs is for tip of master
This graph is with the scheduling change
It can be seen that PoH ticks are more consistent with the change.
Challenges
The thread itself is not able to set it's scheduling policy due to lack of privileges. It needs superuser privileges. I tried using thread_priority crate, and ran into the permissions issue.
We can use chrt to update the scheduling policy from shell. It has to be run using sudo. The above graphs were captured using this approach. I added the following code to remote-node.sh
sudo chrt -r -p 99 `ps -eT | grep solana-poh-serv | awk '{print $2}'
This approach can not be used for external nodes, as the node boots up directly from the native solana-validator program.
Just changing the priority of the thread (using renice) did not help with the problem.
Problem
PoH service generates a certain number of hashes (configured via Genesis block, as hashes_per_tick) before registering a tick. The service thread sets the CPU affinity, but due to OS scheduling, there is no guarantee that no other thread will get scheduled on the CPU core. This causes different node types (leader/validator) to take different amount of time to compute the same number of hashes. This causes network drift as nodes complete their respective slots at different times. End result can be leader timeouts, and dropped slots.
Proposed Solution
I tried setting the scheduling policy for PoH service thread to realtime, with FIFO policy. This helps align the PoH time for all nodes.
Running a 5 node network, I plotted the time it took to do 8 ticks on the cluster. The following graphs is for tip of master
This graph is with the scheduling change
It can be seen that PoH ticks are more consistent with the change.
Challenges
thread_priority
crate, and ran into the permissions issue.chrt
to update the scheduling policy from shell. It has to be run using sudo. The above graphs were captured using this approach. I added the following code to remote-node.shsudo chrt -r -p 99 `ps -eT | grep solana-poh-serv | awk '{print $2}'
This approach can not be used for external nodes, as the node boots up directly from the nativesolana-validator
program.renice
) did not help with the problem.