travitch / persistent-vector

Persistent vectors for Haskell based on array mapped tries
BSD 3-Clause "New" or "Revised" License
27 stars 4 forks source link

Remove indirections #12

Open treeowl opened 4 years ago

treeowl commented 4 years ago

This is kind of a big challenge. The data constructors for the internal nodes shouldn't actually be necessary. We should be able to get rid of them using primitive-unlifted, or a local copy thereof. I have a general sense of the shape of things, but it's all rather tricky.

The first idea is that we should have unlifted arrays of unlifted arrays ... of arrays of elements. The type of that whole thing presumably has to be calculated by a type family from the depth. How do we know that depth at the type level? We fake singletons, and store them as the shift fields. Something like that, anyway.

treeowl commented 4 years ago

Actually, I don't think we need to track tree height in the types. We could simulate something very much like what we have now using enough unsafe coercions. But I'd prefer to track it, for two reasons:

  1. The most important reason, probably: we'll be able to use types to help enforce the height-balance invariant.
  2. As GHC improves, we'll be able to reduce the number of unsafe coercions we need. I imagine that within a year or two, we may only need them at the root node (for reasons relating to type roles), and maybe we'll eventually be able to get rid of them altogether.
treeowl commented 4 years ago

Question: why do we have an array of internal nodes in the root node, and bl not just a single one? Right now, that saves us an indirection, but if it's not essential to the structure it can be dropped once that's no longer true.

travitch commented 4 years ago

Is the question: why is intVecPtrs Array (Vector_ a) instead of Vector_ a? I don't think there is a fundamental reason. It would probably just change some index calculations. It might make the implementation of indexing more regular, but that isn't a design goal if it is faster or more space efficient to not have it that way.

treeowl commented 4 years ago

Yes, that's the question. I don't see any obvious reason for it to be significantly faster (aside from the current indirection issue), and more regular code may also be smaller, which can improve performance all by itself.

treeowl commented 4 years ago

The performance upsides

  1. Indexing and updating functions will have to follow roughly half as many pointers. This sounds very good, but we'll need benchmarks to see how big a difference it makes.

  2. The structure will be a little more compact. This probably doesn't make a whole lot of difference in and of itself, because the arrays will mostly be large relative to the boxing overhead.

The performance downsides, as far as I know

  1. In current GHC, we have to choose between

    a. Sticking with something Array-like rather than switching to SmallArray, or b. Accepting that the (unboxed) array read/write/index operations will have to be NOINLINE for safety reasons.

    It's pretty easy to bury this choice in its own module and let benchmarks decide which is better. Once BoxedRep happens (hopefully soon; GHC Gitlab MR 4249 is the beginning, and hopefully the rest won't take too long), this problem will go away.

  2. To keep things reasonably manageable, I expect there to be some performance regressions in GHC versions that don't support UnliftedNewtypes. Personally, I usually just ignore library performance under non-bleeding-edge GHC versions, but this is your library, so it's ultimately your decision.

  3. The documented "strict spine" claim will become utterly true for the tree portion of the structure. That's almost always a good thing, but it sucks for fmap and for traverse in "lazy" Applicatives. The advantage of a lazy spine in those contexts is that the tree can be built out as its leaves are demanded, rather than needing to be produced all at once with thunks in its leaves. Fortunately, performance of those methods doesn't really seem to be a primary goal for the package.

travitch commented 4 years ago

I was looking at that GHC feature - it sounds exciting. Would that ultimately enable us to write data structure that can (levity) polymorphically hold either boxed or unboxed data? That would be amazing. I'd ditch all backwards compatibility for that.

treeowl commented 4 years ago

Would that ultimately enable us to write data structure that can (levity) polymorphically hold either boxed or unboxed data?

Not really. @andrewthad hopes to eventually support levity-polymorphic bindings (for lifted and unlifted pointers), but there's no proposal for that. @ekmett has managed to use backpack to share a lot of code across unboxed things, and maybe between boxed and unboxed. But boxed and unboxed things really have different calling conventions, so true polymorphism is not likely.

treeowl commented 4 years ago

I'm making good progress on this, but it's quite a slog and most of the steps don't seem very exciting all by themselves.

treeowl commented 4 years ago

I'm making progress, but I think I may have taken a wrong turn. Things are getting messy. I think I may have erred when I got rid of the extra array at the top. While that was weird and unnecessary from the standpoint of the data structure, it made the root of the tree unconditionally an internal node, which seems pretty helpful from an unboxing standpoint. Namely, it lets us write something like !(UnliftedArray (TreeType level a)) to unpack the array tree into the root node. Otherwise, we get non-uniform stuff or other complications. Hrmph. My current branch is the "other complications" version, which I think can be made efficient, but is pretty awful to work on.

treeowl commented 4 years ago

So ... somewhat back to the drawing board now. I'm really not surprised. I fully expected this to be a long and iterative process. On the plus side, I've fixed a lot of other problems in the process, and I've gotten a sense of what needs to happen and (probably) how.

treeowl commented 4 years ago

Hmmm .... Thinking more about it, I don't think that going to an extra array is either necessary or sufficient. The trouble is that, using Array and UnliftedArray, we can't extract an element from an UnliftedArray without first knowing if that element is an Array or an UnliftedArray. My unpleasant workaround uses a polymorphic

data Box (a :: TYPE 'UnliftedRep) = Box# a

This can hold an Array# or an UnliftedArray# just fine in the same constructor. I just don't know how to make programming with that not suck.

treeowl commented 4 years ago

As for why wrap at all and not just use the unlifted types directly: the main reason is that Monad demands lifted results, so all the ST stuff must be lifted. Manually passing State# tokens around through deeply nested case expressions is just awful, and I refuse.

travitch commented 4 years ago

That issue (the requirement of the value returned by bind to be boxed) is what basically tanked my Haskell implementation of a SAT solver.

treeowl commented 4 years ago

Well, I guess I'll see how far I can go with Box. If I can give indexing performance a big boost, it might be worth the unpleasant code. If the improvement proves small, it's probably not worthwhile.