Add support for optimistic concurrency testing

Eh2406 commented 7 years ago

Recent discussion brought up suggestions for optimistic parallel testing.

We should look into allowing these features. I was going to start experimenting tonight, but @edunham mentioned incoming refactorings. So I thout I'd just leave an issue to remind me to come back to it.

Macarse commented 7 years ago

@Eh2406 any update?

Eh2406 commented 7 years ago

I have not worked on it. Hopefully I will have time to get back to it. @edunham, how go the refactorings?

edunham commented 7 years ago

@Eh2406 I've got it passing pep8 which was the first hurdle. The next set of changes should not affect any test results, and new tests would be absolutely wonderful for verifying the changes. Please add tests for anything and everything that you can think of!

Eh2406 commented 7 years ago

So this is a form of auto rollup, not tests of homu. Adding test of homu would be a laudable goal. Sorry for the confusing title.

I am reading up on bors-ng and zuul. Trying to determine which approach requires less intrusive changes to implement and which lead to the biggest difference in understanding for our users.

bors-ng does a optimistic rollup of all approved prs, if pass then merge else try a rollup of half the prs. This is a simple single threaded strategy. On the other hand the meaning of the queue will be different.

zuul starts a separate branch for each pr in the queue this branch has the state the auto will have when this pr is merged with all the predecessors. Then when auto passes its tests then auto merges with master and auto gets set to the branch pr. If it fails its tests then that pr is rejected and all temporary branches have to be rebuilt. Pro: the history is exactly what wud have been created in our current setup. Con: Has many thing going on concurrently that need to be coordinated, and does not reduce the compute cycles needed to empty the queue just make wall times faster.

Next to look at the code to determine how hard this will be to implement.

edunham commented 7 years ago

Ah, sorry for the confusion and thanks for clarifying.

The most important thing to keep in mind when looking at how to stick others' optimistic testing schemes into Homu is that Servo's homu must never land a lower-priority PR when there is a max-priority PR in the queue. This wasn't previously a big deal but the feature is more essential now that it's on the way to becoming part of the Firefox sheriffs' workflow (see https://github.com/servo/homu/issues/111 for background).

Eh2406 commented 7 years ago

Yes I have been keeping that in mind, thanks for clarifying its importance.

For zuul's approach there are no changes required. It still tests all changes separately and in order, just concurrently.

For bors-ng 's approach I have been imagining a to clever by half solution. In which we split on the sum(priority_of_rollup) rollup instead of just splitting the count of prs. And we start by rolling up at most max-priority of prs. Then we automatically have that a max-priority pr will be run on its own. Once a max-priority/(2 ** n) pr is at the top it will be in at most n speculative rollups before being tried on its own. But perhaps a stratforward max-auto-rollup=# for each pr that specifies the number of things it can be merged with is better.

Do you have a link I could read on Gerrit's "fancier rollup logic"? It sounds relevant to this topic.

frewsxcv commented 7 years ago

For posterity, OpenStack has a project similar to bors/homu called Zuul which has a feature called 'gating' that seems relevant:

https://docs.openstack.org/infra/zuul/gating.html

Eh2406 commented 7 years ago

Fundamentally Zuul is running all the same test as the current strategy, just running them in parallel. Unfortunately, rust lang "Right now we unfortunately just don't have capacity". So I have been focusing on adding procedurally generated rollups.

After thinking through several complicated bisecting strategies I am leaning toward something small and conservative. a. Try a rollup of the hole list. b. If it fails test each commit in order, as we do now. c. If one of the separate commits fails then that is probably why the rollup fails so start back at a.

There is also a good discussion of whether we even want procedurally generated rollups going on at internals. Perhaps we need some way to tell homu what should be considered for a rollup.

notriddle commented 7 years ago

I don't see how your approach is any better than just cutting in half, especially if the bad PR winds up in back.

As for the priorities, it doesn't have to be complicated. You need to:

Make sure that every PR in a batch has the same priority. This is the same way bors-ng does multiple branches.
Pick PRs with a higher priority over ones with a lower priority. That's an ORDER BY clause.
When a PR gets added, if the current running batch has a lower priority than the new PR, cancel the current batch and put the batch's PRs back into the queue.

You might also want a max-batch-size option, but that's really orthogonal to the priority.

Eh2406 commented 7 years ago

Indeed it is strictly speaking worse than bisecting when measured by tested pr's/test cycle. The suggested algorithm is still O(n). In the case of a bad pr at the end of the rollup this strategy is worse than current behavior. I see two advantages for it, which I except may not be convincing.

I think it will involve a minimal amount of code changes most of which are adding the ability for the bot to do a rollup on its own. Most of these changes could be reused for any improved handling of a rollup failure like bisecting. So this seems like a good first step.
There is some skepticism about rollups in the community. In the case that all prs are bad, this algorithm rejects the first item on the queue at worst every other test attempt. Bisecting can "waste" log(n) before getting back to work and testing the first item on the queue. If the community likes the improvement in shorter queue length, then we will have an easier time convincing them to accept longer "hiccups" for better average performance in the future.

Making sure all pr's in a rollup have the same priority, sounds like a conservative place to start. Thanks for that suggestion.

The other two should be handled by homu's not having batches, just pr's and a queue. If a user wants to preempt the current rollup they can use a force like they would on any other pr. If they just want it to be next, then set a high priority and it will be at the top of the queue when rollup passes or fails its tests.

servo / homu

Add support for optimistic concurrency testing #102