treeowl / compact-sequences

Stacks and queues with compact representations
BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

doc request: sources #19

Open jwaldmann opened 4 years ago

jwaldmann commented 4 years ago

Hi.

I came here from your message https://mail.haskell.org/pipermail/haskell-cafe/2020-August/132593.html

Could you please add (in the README) a few words on this data structure(s) and their use cases? When should I prefer it over Data.Sequence (etc.)? (when space is important?) Link to the paper/talk that has this info? Does this idea exist in the literature already?

Thanks - J.

treeowl commented 4 years ago

I don't know if this idea exists in the literature or not. I haven't seen it myself, but that really means very little. As a non-academic, I'm not even sure how I'd approach the question. If you can help with that, please let me know. My Haskell Love talk starts just after 4:10 in this video; I expect the properly broken-out/edited version will go up sometime in the next week or so.

When/if the structure is valuable in practice remains to be seen. My best guess is that it will make sense in applications with light to moderate use of persistence. We'll really need to address #3 and perhaps #10 to get a really good idea of practical performance. The baseline versions here will have quite a lot of indirection and churn right at the tippy top of the structures (the first 2 to 4 nodes or so) that doesn't remotely pay for itself.

Data.Sequence is a much more flexible data structure, which will be far better if you need to do a lot of splitting or appending.

Lysxia commented 4 years ago

One way to search the literature is to find any related work (in this case, Okasaki's thesis), and then look up papers citing and cited by it in Google Scholar, possibly with some keywords (e.g., "functional queue") to narrow the search.

From a very cursory search, the idea of arrays in powers of two is definitely out there, although I didn't see this exact configuration.

treeowl commented 4 years ago

@Lysxia, did you see any power-of-two arrays in related configurations I might want to take a look at?

treeowl commented 4 years ago

I'd be especially interested in anything that might point the way toward either o(n) space overhead in o(log n) time or a proof that that's impossible in the context of GHC Haskell.

Lysxia commented 4 years ago

I can't guarantee you'll find something worthwhile, but two papers that caught my eye are

treeowl commented 4 years ago

I'll be somewhat surprised if the Kaplan and Tarjan paper proves relevant. My general impression is that their structures there are closely related to Okasaki's, but with much more complicated number systems to get worst-case bounds instead of amortized ones. Thanks for pointing out the other paper; I haven't encountered it before and I'll give it a skim.

treeowl commented 4 years ago

Ooh, the edited video is up!