Open treeowl opened 3 years ago
Imo size
should stay O(1). That's what people expect and relaxing it to O(log n) doesn't improve any other complexities (and I doubt the performance gains in other functions would be significant), so I don't think it's worth it.
Superseded by #33.
I think we should open this idea back up. I disagree that people generally expect size to be $O(1)$. It isn't for IntMap
or HashMap
, for example. I would actually be surprised if a substantial number of algorithms needed the size at all.
I think we should open this idea back up. I disagree that people generally expect size to be O(1). It isn't for
IntMap
orHashMap
, for example. I would actually be surprised if a substantial number of algorithms needed the size at all.
Well, I do generally expect size
to be O(1). I'd also expect IntMap
and HashMap
to have O(1) size
, fwiw (and I'm not the only one, see https://github.com/haskell/containers/issues/763 and https://github.com/haskell-unordered-containers/unordered-containers/issues/138). I also don't know any other language where size
for standard collections isn't O(1) (save for lists in FP languages).
For IntMap
and HashMap
, you could argue that it makes operations like union, difference & intersection more complicated, but I don't see that being the case here (e.g. union
can just add the sizes, since duplicate keys are allowed).
A few examples where you want fast size
, that I can think of off the top of my head:
take
/drop
will "succeed (i.e. if the operation actually takes/drops the desired amount of elements)FWIW, I've dug into all the packages on Hackage that show a dependency on pqueue
. Here are the results:
exference
: USES SIZE. However, to my (admittedly untrained) eye, it looks like it's being used in contexts where the other things going on are going to be considerably more expensive than a log-time size calculation.
reactive-banana
: does not use size
chr-core
: does not use size
beam-migrate
: does not use size
LPPaver
: does not use size
patch-image
: does not use size
uhc-util
— false dependency; opened PR to remove
time-warp
— abandoned project; archived Git repo
second-transfer
— Github master no longer depends on pqueue. The project also looks abandoned.
magic-wormhole
— false dependency; opened PR to remove
If, as this limited survey suggests, using size
is rare in practice, it would seem potentially reasonable to make "sized queues" that store their size separately. This will be less efficient for applications that use those (except where they can unpack the queues), but more efficient for other applications.
TBH, I think this primarily shows that not many packages depend on pqueue
(at least on Hackage). I'm not convinced that removing the size field is worth it. Not having O(1) size just feels wrong.
It's possible to calculate the size in amortized
O(log n)
time. Once #24 is finished, that will improve to worst-caseO(log n)
time (except in the presence of pathological nesting ofmapMonotonic
calls). Is it really worth tracking the size to get that toO(1)
? It's not terribly expensive, but it's one more word of allocation oninsert
.