Closed treeowl closed 4 years ago
Next step, I think: take advantage of this change to get rid of the internal node constructors altogether, representing everything but the root as a nest of ArrayArray#
s ending in an Array#
. It's going to look horrible, and I doubt there's any type safety-net, but it'll roughly halve the number of indirections.
Very cool - I'm always in favor of removing error states. It looks like there is no space penalty, which is great.
I'm much more familiar with GADTs compared to when I first wrote this - there might be a decent way to encode some of the internal invariants at the type level to get a bit more safety on the Array#
s.
I was a bit disappointed in the performance of this data type w.r.t. garbage collection last time I played with it, but it has been a while. I should re-investigate the benchmarks on modern GHC. I'd love this to be a broadly-useful data structure.
@travitch I'm struggling to guess at the meanings of the various Int
fields. Could you help me out? Also, there's a mysterious comment
-- Note: using Int here doesn't give the full range of 32 bits on a 32
-- bit machine (it is fine on 64)
Do you think you could explain? Perhaps-relatedly, what is the maximum capacity of a Vector
or a 32/64-bit system?
I don't think GADTs can help with that; digging through a pile of GADT
constructors is just as bad as digging through any other sort of node. Maybe there's some tricky way with type families or something, but I bet it will destroy Coercible a b => Coercible (Vector a) (Vector b)
, which is generally desirable. I seem to remember @AndrasKovacs having some kind of idea for making these kinds of things more type-safe, but I don't know what that idea is.
Side note: the failed checks don't seem to be my fault.
My biggest personal question about the Int
fields is whether one of them can be used to get the height of the tree. That's the key to getting rid of those indirections.
I believe the fields are:
vecSize
: Number of elements in the vectorvecOffset
: Supports taking constant-time slices of vectors (I think it is the first index into the vector that contains an element for the current slice)vecCapacity
: The amount of space available in the vector slices (i.e., in intVecPtrs
- the number of elements not in the tail)vecShift
: I'm a bit embarrassed to say that I forget and did not document it enough. It appears to be related to the height of the tree in a weird way. I think it is height * 5. I assume it is done that way to avoid more expensive computations of some derived value based on the height...@travitch I'm struggling to guess at the meanings of the various
Int
fields. Could you help me out? Also, there's a mysterious comment-- Note: using Int here doesn't give the full range of 32 bits on a 32 -- bit machine (it is fine on 64)
Do you think you could explain? Perhaps-relatedly, what is the maximum capacity of a
Vector
or a 32/64-bit system?
I think I was reading from the Haskell standard at that point and saw that the size of an Int
is only guaranteed to be up to 30 bits, but I don't think that is actually true for GHC. I don't recall the exact size bound off the top of my head, but I do remember working out that you don't have enough address space to allocate enough elements to totally fill the tree (without creative sharing).
Also, I'm curious - do you have a use case in mind for persistent-vector?
I don't really use libraries so much. I'm more about writing/improving them, at least for now.
We naturally have a top level, root/empty, and everything else, internal/data. Splitting these apart immediately gets rid of a bunch of "impossible" errors and also discards a bunch of impossible case branches.