unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.79k stars 270 forks source link

Performance concern: `Name`'s `Ord` instance is not cheap #2465

Closed mitchellwrosen closed 3 years ago

mitchellwrosen commented 3 years ago

I noticed in a time profile that comparing Names contributes a nontrivial amount of time to computing a branch diff.

Here's the Ord instance:

-- The `Ord` instance for `Name` considers the segments of the name
-- starting from the last, enabling efficient search by name suffix.
--
-- To order names alphabetically for purposes of display to a human,
-- `sortNamed` or one of its variants should be used, which provides a
-- Unicode and capitalization aware sorting (based on RFC5051).
instance Ord Name where
  compare n1 n2 =
       (reverseSegments n1 `compare` reverseSegments n2)
    <> (isAbsolute n1 `compare` isAbsolute n2)

Perhaps we should consider replacing this instance with something far cheaper to compute (ideally O(1)), since compare is often assumed to be cheap and performed many times on the fly, especially when dealing with Maps, Sets, Relations, etc.

Hopefully it will be possible to achieve this without making the use cases for which we want to sort and search by name suffix prohibitively expensive. I'm not sure where we do that kind of thing currently.

runarorama commented 3 years ago

Seems like these should be two different types with two different Ord instances. So like Name for the ordinary use, and SuffixedName or something when wanting to search by name suffix.

pchiusano commented 3 years ago

While it's possible to have the "source of truth" for Name be ordered in some other way (and that was considered), it just means you have to create and maintain auxiliary indices to support the suffix-based lookup and search that we do everywhere (and which we want to be logarithmic time). Creating and maintaining aux indices is more complicated and hasn't worked out great previously.

If this is a bottleneck, we could change the representation of Name to be Name [NameSegment] IsAbsolute where the name segments are in reverse order. Or even pack that into a single Text. (So List.map has an internal representation of "map.List") That would make comparisons just as cheap as comparing Text while also giving us the suffix-based querying we want.

We previously would do an O(n) "suffixification" pass over the namespace to compute an index for supporting suffix lookups - this was slow, didn't let us determine ambiguous suffixes (https://github.com/unisonweb/base/issues/44), and it needed to be recomputed every time the namespace changed. It also was getting recomputed on every character typed in the fuzzy find.

There is nothing useful about the default lexicographical ordering of Text other than that it's fast (but if we just want that, we might just use hashing instead of Ord). It doesn't correspond to human alphabetical ordering. It doesn't let us do suffix-based search. IMO it doesn't deserve to be the default ordering for names and it was chosen previously without really any thought. :)

More background on all this in #2290

mitchellwrosen commented 3 years ago

Thanks for the background.

I totally agree that the Ord instance derived from Text is not semantically so useful or meaningful, and I really like how searchRan can carve out names with similar suffixes.

To that end I do like the idea of changing Name's representation to allow for a faster Ord implementation at the expense of some other functions.

I'm not sure a newtype is the right solution here - I don't think we want any (new)type that has an expensive Ord instance which is put into a Relation.