microsoft / verona-rt

The runtime for the Verona project
MIT License
25 stars 14 forks source link

Work Stealing #48

Open mjp41 opened 1 month ago

mjp41 commented 1 month ago

During @vishalgupta97 internship we observed problems with the work stealing algorithm that if all the work was generated by a single scheduler thread, then the other threads had heavy contention in the stealing code. This is due to the stealing code only stealing a single item at a time.

There are a few papers that suggest stealing in larger units than single work items:

The suggestions is that stealing at most half of the work is sensible.

This PR changes the implementation of the scheduler queues to support a steal all operation. By providing N (N=4 in the PR) queues per scheduler thread, then we are able to steal approximately 1/4 of the work on a scheduler thread.

The PR adds a very simple benchmark where a single behaviour generates all the work in the system. Configured with 4 cores, we get

Before this PR:

Scheduled all work took:
        317 ms
Elapsed:
        606 ms

While with his PR:

Scheduled all work took:
        234 ms
Elapsed:
        251 ms

This is running on my laptop with just four cores, so bigger experiments should be undertaken.

There is also a lot of opportunity for further investigations, in terms of different strategies for stealing and scheduling work. This PR represents a simple position that improves the state of the system.