Should queues track size?

treeowl commented 3 years ago

It's possible to calculate the size in amortized O(log n) time. Once #24 is finished, that will improve to worst-case O(log n) time (except in the presence of pathological nesting of mapMonotonic calls). Is it really worth tracking the size to get that to O(1)? It's not terribly expensive, but it's one more word of allocation on insert.

konsumlamm commented 2 years ago

Imo size should stay O(1). That's what people expect and relaxing it to O(log n) doesn't improve any other complexities (and I doubt the performance gains in other functions would be significant), so I don't think it's worth it.

konsumlamm commented 2 years ago

Superseded by #33.

treeowl commented 1 year ago

I think we should open this idea back up. I disagree that people generally expect size to be $O(1)$. It isn't for IntMap or HashMap, for example. I would actually be surprised if a substantial number of algorithms needed the size at all.

konsumlamm commented 1 year ago

I think we should open this idea back up. I disagree that people generally expect size to be O(1). It isn't for IntMap or HashMap, for example. I would actually be surprised if a substantial number of algorithms needed the size at all.

Well, I do generally expect size to be O(1). I'd also expect IntMap and HashMap to have O(1) size, fwiw (and I'm not the only one, see https://github.com/haskell/containers/issues/763 and https://github.com/haskell-unordered-containers/unordered-containers/issues/138). I also don't know any other language where size for standard collections isn't O(1) (save for lists in FP languages).

For IntMap and HashMap, you could argue that it makes operations like union, difference & intersection more complicated, but I don't see that being the case here (e.g. union can just add the sizes, since duplicate keys are allowed).

A few examples where you want fast size, that I can think of off the top of my head:

wanting to check that the queue doesn't exceed a maximum size
checking if an operation like take/drop will "succeed (i.e. if the operation actually takes/drops the desired amount of elements)
choosing different strategies depending on the number of elements (e.g. in a scheduler)
generally keeping track of how many items you have (e.g. for debugging, UI)

treeowl commented 1 year ago

FWIW, I've dug into all the packages on Hackage that show a dependency on pqueue. Here are the results:

Real dependencies

exference: USES SIZE. However, to my (admittedly untrained) eye, it looks like it's being used in contexts where the other things going on are going to be considerably more expensive than a log-time size calculation.

reactive-banana: does not use size chr-core: does not use size beam-migrate: does not use size LPPaver: does not use size patch-image: does not use size

False dependencies

uhc-util — false dependency; opened PR to remove time-warp — abandoned project; archived Git repo second-transfer — Github master no longer depends on pqueue. The project also looks abandoned. magic-wormhole — false dependency; opened PR to remove

If, as this limited survey suggests, using size is rare in practice, it would seem potentially reasonable to make "sized queues" that store their size separately. This will be less efficient for applications that use those (except where they can unpack the queues), but more efficient for other applications.

konsumlamm commented 1 year ago

TBH, I think this primarily shows that not many packages depend on pqueue (at least on Hackage). I'm not convinced that removing the size field is worth it. Not having O(1) size just feels wrong.

lspitzner / pqueue

Should queues track size? #25

Real dependencies

False dependencies