treeowl / compact-sequences

Stacks and queues with compact representations
BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

Representation tweak? #35

Open treeowl opened 4 years ago

treeowl commented 4 years ago

We don't actually need a lazy spine. An alternative representation for a queue, for example:

data N10 n a = N10 !(Array n a) !()
data N11 n a = N11 !(Array n a) !() !(Array n a)
-- () separates the front digit from the rear for clarity.
-- ...

data Queue n a
  = Empty
  | Q10 (N10 n a) !(Queue ('Twice n) a)
  | ...

This has a little more indirection and space overhead, so presumably the fundamental operations will be slower. But it also has some benefits:

  1. length becomes logarithmic time.
  2. We can index into the queue faster. While we have to pay off debits on all the nodes en route to the one we seek, we don't actually have to force them. This should win a nice constant-factor improvement, I believe.
  3. If I'm right about how efficient (still linear time) appending will work, I believe we should get better cache performance by waiting to force nodes until we've calculated out the shapes they're copied into.
treeowl commented 4 years ago

A compromise representation could keep the lazy spine but store its shape alongside, packed in a bit vector. That would speed length calculations and (I think) appends, without needing totally separate implementations of the basic operations.

treeowl commented 4 years ago

@Lysxia, what's your intuition on this?

Lysxia commented 4 years ago

I may be missing some context. IIUC it seems the difference is not forcing the vectors when you force the spine.

treeowl commented 4 years ago

length is currently linear amortized time because it has to force the whole spine, which may trigger array appends and splits. With this alternative, we can count how many arrays are in each level without forcing anything. So yes, your overall understanding seems right. But it's not just length, it's also shape, which looks important for appending.

Lysxia commented 4 years ago

I feel like the access patterns this would benefit are too peculiar to be worth it. It's easier to say that random access, and operations like length that must have a view of the whole sequence instead of just the ends, lose the lazy amortization guarantees. Only eager amortization is left, so things might as well be forced indiscriminately.

treeowl commented 4 years ago

They don't lose the amortization guarantees regardless. Of the functions I've listed, only length could change its amortized bounds. For the rest, it's just about constant factors.

treeowl commented 4 years ago

Part of the thinking on this whole approach relates to your notion of debit provenance, which is a little more refined than Okasaki's debit passing rule.

treeowl commented 4 years ago

One yuck point to the wholesale change: we'd have to manage strictness very much "by hand". It's not terrible, but it's another place to make mistakes. On the plus side, if we ever want a worst-case version (for cache performance under certain assumptions, not total space), then we'll need something like that and more.