Closed Niols closed 1 year ago
For the record (understanding that it's easy to change in the future), in my experience, a mutable implementation (based on ST
) is quite a bit faster. So if performance is an issue it's something to think about.
Note at this point that the performance seem to be basically the same as before, if not a bit slower (because more indirections?). Implementing the optimisations should improve this drastically, if it ever becomes relevant.
- The OHearn test has caused issues before, but it makes sense that we could have metavariables potentially equal that are unassigned. But maybe we need some more complex tests if there is generally only this one test where these edge cases come into play.
It is definitely weird that only one test changes behaviour depending on whether or not we allow meta-variables to be equal without assignment. We really need more tests that expose that kind of edge cases, indeed.
- On a similar note, I mentioned before but we probably would want some tests just testing the union-find implementation on its own, especially once the optimizations are implemented.
Definitely! It will also be good practice for me to have a look at how tests are done. I will try adding tests for the union-find implementation and also maybe some extra tests for the symbolic engine in general (which then would probably escape the scope of this PR).
- The optimizations would be nice, and it would also be nice to see how much the optimizations affect things. For example, I don't really have any intuition as to how often we do nontrivial unions of metavariables or anything
Well, the weird thing is that, by mimicking the previous implementation, I only ended up using lookup
, trivialUnion
and trivialInsert
, while the whole purpose of the union-find is to have this “fancy” union which takes care of things for you. Basically, I think it shows how deeply-inlined the union-find was, and I think it calls for a full rewrite of unifyMetaWith
and unifyNewMetaWith
into (almost only) one single call to union
with the right merge function.
To be honest, I am not even sure at this point that I understand how this algorithm can possibly work without ever triggering a non-trivial union. I suppose there is an important property of meta-variables that makes this happen (or rather not happen), but I don't see it yet.
- It seems like using a State monad would make things a bit nicer, and apparently maybe even faster!
It would definitely make the interface of the UnionFind
module look nicer. And also maybe it would make easier, when using UnionFind
, to propagate the modifications (and therefore the optimisations, when we implement them)?
- I don't think I understand what persistent union-find means. Is the idea that we would no longer use a tree-like structure to keep track of the equivalences?
This question was the occasion for me to read on the difference between “persistent” and “immutable”. In short:
Data.Map.insert
) give you a new pointer to an updated structure but old pointers are still valid. Observations (eg. with Data.Map.lookup
) cannot distinguish the state of an old pointer before or after updates.(If you already knew all that, sorry, then it was only for me :p)
My point was therefore that, as of now, UnionFind
provides an immutable data-structure. But union-find is a funny case where it is easy to make find
have a side-effect instead of returning an updated structure while keeping the persistent interface. You get the nice properties of persistence but the interface becomes slightly simpler (because find
doesn't need to return a union-find anymore). Additionally, you get the speed-up of the optimisation and this speed-up is shared between diverging versions of the union-find, ie. if you start from a union-find u
and update it to v
, then start from u
again and update it to w
, and then you call find
on v
, you gain speed-ups for further calls to find
on v
but also on w
. In our case where we keep branching at each if-then-else
, it might actually give us an extra speed-up. This is a consideration for muuuuch later, and it is also out of my Haskell skills because I basically have no idea how to use/implement mutable structures :p
Overall, this seems very nice, good work! I suppose if this ultimately gets merged (or rather if Victor's PR gets merged), the most pressing issues would be adding optimizations and adding tests.
I'll start by adding tests because it is also very good practice for me. This will allow us to measure the optimisations if we ever decide to implement them.
Thanks for the response, I understand the persistence property now. As for tests, if you would like to pair program some at some point this week or next, let me know. It would be a good learning experience for me as well.
As for tests, if you would like to pair program some at some point this week or next, let me know. It would be a good learning experience for me as well.
That sounds great; let's do that!
May I inquire why this PR is still ongoing? It's been open for a month, it should have been merged by now.
@0xd34df00d
That's a really good separation of responsibilities, thanks! In the long run, it'd be cool to have that as a separate library on Hackage, as I can see it useful elsewhere as well, and other libraries have some downsides (at least from a quick survey).
I had the same impression about other libraries which is why I decided to go for a home implementation. I suppose it could be separated from Pirouette indeed.
Having said that, I only have minor concerns. In particular, I'm not sure that
Ancestor
should really containMaybe
— if the UF user has a use case with partiality, they can just haveUnionFind key (Maybe val)
. At least, I don't feel likeNothing
is intrinsic to the UF itself. I see there was some discussion about that already, but I'm not sure how relevant it is now, so we can catch up on that separately.
It is a good question and I'm not sure what the right answer is. For now, Nothing
is used to convey the fact that we know of an equivalence class but we haven't bound a value to it. It seems sometimes important, for instance for code that learns equivalences and assignments one by one and not necessarily in an order that would work without a Maybe
.
The code would probably be nicer without a Maybe
under Ancestor
. At least, it would avoid merging two different pieces of information into one, and it would avoid potential bugs like the one discussed in https://github.com/tweag/pirouette/pull/150#discussion_r991250090.
Also, ideally it'd be great to have more tests, especially property-based ones, but that can wait until when (and if) the UF implementation will be factored out to a separate library. Passing the usual pirouette tests is good enough for now, since they exercise enough of UF machinery.
I would be all for property-based tests for this little piece of code. I hope that the usual Pirouette tests are good enough but I am not sure. Do we have property-based testing machinery in the Pirouette project or would that be adding big dependencies?
@aspiwack
May I inquire why this PR is still ongoing? It's been open for a month, it should have been merged by now.
You are completely right, this PR should be merged by now (or closed; but not stalled). I left it hanging because I wasn't sure that it was the right way to go. I will make sure that we review and merge it quickly.
This PR seems ready to be merged. It passes all the tests that were previously there as well as the newly-introduced ones. It now implements the classic optimisations of the Union-Find structure. It could be further optimised by using a structure different from a Map (but still persistent), but I don't think it makes much sense to go through the trouble right now. We can always iterate later on if we feel the need.
This PR builds on top of #146. It comes from the remark that the code for constraints unification basically inlines the implementation of a union-find data structure, which makes it harder to read, harder to optimise (with eg. the typical union-find optimisations) and more error-prone. The current PR implements a tiny union-find module and
What is a Union-Find data structure?
Not going to dig into history here, but a union-find data structure basically allows to represent efficiently equivalence classes over values as long as there exists a total comparison function. A Union-Find has two main functions:
union
merges two equivalence classes together. This typically has the typekey -> key -> UnionFind key -> UnionFind key
.find
returns a representative of the given equivalence class. This typically has the typekey -> UnionFind key
with the invariant that for alluf
,k1
,k2
,find k1 uf == find k2 uf
if and only ifk1
andk2
are in the same equivalence class. This function is typically not exposed to the user to which we rather expose an equivalence test of typekey -> key -> UnionFind key -> Bool
.Union-Find can very easily be extended to represent a map from equivalence classes to values. Such a structure has three main functions:
union
merges together two equivalence classes. Because these equivalence classes are associated to values, one must specify how to merge the values as well. Typically, the type would beunion :: (value -> value -> value) -> key -> key -> UnionFind key value -> UnionFind key value
.find
returns a representative of the given equivalence class as well as the associated value. This typically has the typekey -> UnionFind key value -> (key, Maybe value)
. Once again, this function is typically not exposed and we will prefer to expose an equivalence test (key -> key -> UnionFind key value -> Bool
) and a lookup (key -> UnionFind key value -> Maybe value
).insert
(oradd
or whatever, I don't think there is a well-defined name for that) adds a mapping from key to value. Because there might already be a mapping from this equivalence class to a value, one must specify how to merge the values. Typically, the type would beinsert :: (value -> value -> value) -> key -> value -> UnionFind key value -> UnionFind key value
.Some optimisations can make such structures very efficient with quasi-linear time for both operations. One of them requires that the
find
operation itself updates the structure, giving the types:(Note: a state monad would make those types much nicer.)
Where is this inlined Union-Find?
As of f12b951 on #146, a set of constraints is represented as follows:
https://github.com/tweag/pirouette/blob/f12b95104b8b17996228587867c85f193944b7ab/src/Pirouette/SMT/Constraints.hs#L29-L41
Note in particular the fields
csAssignments :: M.Map meta (TermMeta lang meta)
andcsMetaEq :: M.Map meta meta
that only make sense when taken together:csMetaEq
maps meta-variables to other meta-variables with the invariant that ifv
maps tov'
, thenv'
is smaller thanv
for some comparison (here, the lexicographic order, but it does not matter).csMetaEq
therefore represents an equivalence relation of meta-variables as well as a way to pick their representative: the smallest element according to the comparison.csAssignments
maps meta-variables to terms with the invariant (not mentioned in the comments) that a key incsAssignments
is always a representative incsMetaEq
. It is therefore intended as a map from equivalence classes to terms.Keeping those invariants brings complexity to the unification functions. See for instance
unifyMetaWith
:https://github.com/tweag/pirouette/blob/f12b95104b8b17996228587867c85f193944b7ab/src/Pirouette/SMT/Constraints.hs#L135-L147
and in particular line 142 that resolves the representative of the equivalence class of the given meta-variable. In this example, abstracting and using a union-find library would bring two advantages. First, it would make the code simpler by removing some cases. Second, it would make clearer what happens in some corner cases. For instance, in
unifyMetaWith
, what happens if there is an equivalence class incsMetaEq
but no binding for it incsAssignments
? The variable is considered to be a new meta-variable and is sent tounifyNewMetaWith
. Is this a bug? Is this intended? Because it sure seem to be violatingunifyNewMetaWith
's invariant.How would it look with a Union-Find?
Look at the code! In 178003f21fbbb5d5aeb7a9318a88bcdc9af81d2f, there is in particular:
https://github.com/tweag/pirouette/blob/178003f21fbbb5d5aeb7a9318a88bcdc9af81d2f/src/Pirouette/SMT/Constraints.hs#L30-L40
and
https://github.com/tweag/pirouette/blob/178003f21fbbb5d5aeb7a9318a88bcdc9af81d2f/src/Pirouette/SMT/Constraints.hs#L130-L144
Additional Notes and Questions
Is it possible for meta-variables to be equivalent (and mapped in
csMetaEq
as such) without having any binding incsAssignments
? I tried to assume that it wasn't but the “OHearn” test (and only that one) started to fail. It goes against my initial intuition but I see now why it would make sense. I would therefore need someone's opinion on this.Some previous functionalities are missing (eg. the
Monoid
instance forConstraintSet
; actually, that one seems buggy to me in the previous implementation). They do not seem used anywhere, though?At this point, I did not implement any of the optimisations of the union-find data structure. The union-find's functions' interface already prepares for this, though.
Maybe we want to give a better interface to this union-find module by using a state monad somewhere?
A persistent union-find is a peculiar data structure in that the function
find
modifies the structure without changing its semantics. In this particular case, it means that it is possible to makefind
actually mutate the memory without violating the interesting properties of persistence. This gives an extra tiny speedup in case the union-find is used in several places without much differences. This makes things a bit dirty though.