travitch / persistent-vector

Persistent vectors for Haskell based on array mapped tries
BSD 3-Clause "New" or "Revised" License
27 stars 4 forks source link

Split Vector type #6

Closed treeowl closed 4 years ago

treeowl commented 4 years ago

We naturally have a top level, root/empty, and everything else, internal/data. Splitting these apart immediately gets rid of a bunch of "impossible" errors and also discards a bunch of impossible case branches.

treeowl commented 4 years ago

Next step, I think: take advantage of this change to get rid of the internal node constructors altogether, representing everything but the root as a nest of ArrayArray#s ending in an Array#. It's going to look horrible, and I doubt there's any type safety-net, but it'll roughly halve the number of indirections.

travitch commented 4 years ago

Very cool - I'm always in favor of removing error states. It looks like there is no space penalty, which is great.

I'm much more familiar with GADTs compared to when I first wrote this - there might be a decent way to encode some of the internal invariants at the type level to get a bit more safety on the Array#s.

travitch commented 4 years ago

I was a bit disappointed in the performance of this data type w.r.t. garbage collection last time I played with it, but it has been a while. I should re-investigate the benchmarks on modern GHC. I'd love this to be a broadly-useful data structure.

treeowl commented 4 years ago

@travitch I'm struggling to guess at the meanings of the various Int fields. Could you help me out? Also, there's a mysterious comment

-- Note: using Int here doesn't give the full range of 32 bits on a 32
-- bit machine (it is fine on 64)

Do you think you could explain? Perhaps-relatedly, what is the maximum capacity of a Vector or a 32/64-bit system?

treeowl commented 4 years ago

I don't think GADTs can help with that; digging through a pile of GADT constructors is just as bad as digging through any other sort of node. Maybe there's some tricky way with type families or something, but I bet it will destroy Coercible a b => Coercible (Vector a) (Vector b), which is generally desirable. I seem to remember @AndrasKovacs having some kind of idea for making these kinds of things more type-safe, but I don't know what that idea is.

Side note: the failed checks don't seem to be my fault.

treeowl commented 4 years ago

My biggest personal question about the Int fields is whether one of them can be used to get the height of the tree. That's the key to getting rid of those indirections.

travitch commented 4 years ago

I believe the fields are:

travitch commented 4 years ago

@travitch I'm struggling to guess at the meanings of the various Int fields. Could you help me out? Also, there's a mysterious comment

-- Note: using Int here doesn't give the full range of 32 bits on a 32
-- bit machine (it is fine on 64)

Do you think you could explain? Perhaps-relatedly, what is the maximum capacity of a Vector or a 32/64-bit system?

I think I was reading from the Haskell standard at that point and saw that the size of an Int is only guaranteed to be up to 30 bits, but I don't think that is actually true for GHC. I don't recall the exact size bound off the top of my head, but I do remember working out that you don't have enough address space to allocate enough elements to totally fill the tree (without creative sharing).

travitch commented 4 years ago

Also, I'm curious - do you have a use case in mind for persistent-vector?

treeowl commented 4 years ago

I don't really use libraries so much. I'm more about writing/improving them, at least for now.