perf: fewer intersections in satisfier

Eh2406 commented 6 months ago

This is some "straightforward" re-factoring to the satisfier code to take advantage of #171. This is not my first time following this line of inquiry. In previous attempts I have only saved the final solution, and then not been confident that the final solution had the same semantics as the original. So this time I kept a separate commit for each transformation. I also kept the original code around with debug asserts to make sure they had the same behavior. Each commit passes all tests. I would strongly recommend reviewing this one commit at a time, and looking at the commit message for why I thought it was correct.

Some highlights: 1299c67 goes from 2 intersections to 1 for each decision searched in the history. 5127437 goes from a linear to a logarithmic scan. 78f8fb1 and the commits that follow are just things I noticed while staring at this code and can be left for follow-up PRs if their controversial.

As for performance the normal benchmarks think this is lost in the noise. Because they don't do much backtracking and intersection is fairly cheap. The new synthetic benchmark slow_135_0_u16_NumberVersion improves its runtime by 82%!

mpizenberg commented 6 months ago

This is quite the list of changes. I've made a first read pass. Most of things seem ok, but I was very skeptic about what I read here 2ee7c0857. Implications (=>) is not equivalence (<=>). I'll have to re-read all that at a more reasonable hour.

Eh2406 commented 6 months ago

I made such a long list of changes so that it would be easy to call out exactly where I had made an incorrect leap of logic. That does seems suspicious. I am comforted by the assertion passing, but that doesn't mean I can prove it's valid. ... Speaking of the assertion, I used cfg(test) not cfg(debug_assertions), let's see if it still works with that fixed. Yes it did. Sorry for the force push.

I'm going to set the prop tests to keep running overnight, and see if they found anything by the morning.

Eh2406 commented 6 months ago

So here's the way I'm thinking about it at the moment, there are four kinds of versions. A version can either be contained in accumulated_intersection.intersection(start_term) (for space called accumulated) or not, simultaneously the version can either be contained in incompat_term or not.

`term`	type 1	type 2	type 3	type 4
`accumulated`	True	True	False	False
`incompat_term`	True	False	True	False
`accumulated.intersection(incompat_term)`	True	False	False	False
`accumulated.intersection(incompat_term) == accumulated` (aka `old`)	True	False	True	True	True
`accumulated.intersection(incompat_term.negate())`	False	True	False	False
`accumulated.intersection(incompat_term.negate()) == empty` (aka `new`)	True	False	True	True

(Edit: Now that I have fixed my mistakes) It looks to me like it is correct.

Eh2406 commented 6 months ago

Having run the prop tests overnight, they still haven't found anything.

mpizenberg commented 6 months ago

Having run the prop tests overnight, they still haven't found anything

That's great!

I'm taking some family time for the end of year so probably won't have a more in-depth review of this until the new year. I'd like to review this more in depth, but don't want to be a big blocker if you have a lot more planned during this period.

Eh2406 commented 6 months ago

Please enjoy your holidays with no stress from this project. I suspect you will not be the only one with limited time to look at this. (I'm very excited to see how much difference this makes on the non-synthetic cases, but I must be patient.)

Eh2406 commented 6 months ago

I discussed this with a mathematician friend tonight. I believe the transformation is correct, and is easier to understand starting a little further back. (Utilizing the shorthand accumulated for accumulated_intersection.intersection(start_term)) The original formulation was accumulated.subset_of(incompat_term). This asks if there is an element of accumulated that is not in incompat_term. accumulated.intersection(incompat_term.negate()) directly computes all the versions in accumulated that are not in incompat_term.

konstin commented 6 months ago

Great work!

main: 1fd7ec388a230936fb7c63e68120065eb8474c76 branch: 201d0b7f63d1b74bb267f3e0a40ff86b30be8f41

tf-models-nightly main

real 0m2,533s user 0m2,226s sys 0m0,303s

tf-models-nightly branch

real 0m1,235s user 0m0,949s sys 0m0,292s

bio_embeddings main

real 1m42,222s user 1m40,462s sys 0m1,651s

bio_embeddings branch

real 0m32,804s user 0m30,528s sys 0m1,740s

mpizenberg commented 6 months ago

I discussed this with a mathematician friend tonight. I believe the transformation is correct, and is easier to understand starting a little further back. (Utilizing the shorthand accumulated for accumulated_intersection.intersection(start_term)) The original formulation was accumulated.subset_of(incompat_term). This asks if there is an element of accumulated that is not in incompat_term. accumulated.intersection(incompat_term.negate()) directly computes all the versions in accumulated that are not in incompat_term.

Thanks for that!

pubgrub-rs / pubgrub

perf: fewer intersections in satisfier #174