hrldcpr / pcollections

A Persistent Java Collections Library
Other
765 stars 79 forks source link

OrderedPSet.minus is linear time #95

Closed prdoyle closed 2 years ago

prdoyle commented 2 years ago

OrderedPSet.minus calls TreePVector.minus which loops through all the entries searching for the right one to remove.

Possible fix?

Rather than augmenting the PVector with a PSet, use something like a PMap<E, Integer> that maps each entry to its index in the vector.

Boxing all these ints would add overhead, but at least it would have the right asymptotic complexity.

Alternately, it might be a good idea to have an IntTree that maps ints to ints

hrldcpr commented 2 years ago

Great point! This is one of the few classes I didn't write, I hadn't even noticed that…

It's definitely tempting to just duplicate IntTree.java and replace V with int… I think I'd be down with that. (Though someday maybe such nonsense won't be necessary! https://openjdk.org/jeps/8261529)

I can probably do this pretty quickly/easily, but if not pull requests are always welcome!

hrldcpr commented 2 years ago

…oh but actually, why would we want int-to-int IntTrees? I'm missing how that's relevant here.

So in the meantime I think I'll just use a PMap<E, Integer>

hrldcpr commented 2 years ago

Now that I think about it, I don't think it's quite as simple as replacing the PSet with a PMap storing the indices, because multiple indices may change after minus() is called.

There could be a long id which we increment for every new element (would take hundreds of years at 1Gops/s to overflow) and then a PMap<E, Long> for fast element lookup and a PSortedMap<Long, E> for fast iteration. I'll throw together a PR for that unless you have a better idea

prdoyle commented 2 years ago

Good point. This is tricky!

prdoyle commented 2 years ago

After a coffee I thought about this and arrived independently at the same solution 😂.

If someone's really worried, they could "re-pack" the longs if they get out of hand, but I imagine that kind of code that practically never runs is more likely to do harm than good when it finally awakens.

prdoyle commented 2 years ago

I forget what I was thinking with the int-to-int trees FWIW. 🤔

prdoyle commented 2 years ago

It just occurred to me: if you get your idea working, it might also allow us to make an OrderedPMap. 🤔

prdoyle commented 2 years ago

(I mention this because OrderedPMap is actually what I wanted. I wanted a persistent LinkedHashMap really.)

hrldcpr commented 2 years ago

Good to know! I wrote the new ordered set, just gonna add some benchmarks to make sure there's no performance regression, and then should be easy to do ordered map too. I'll tag you in the PR once it's finished.

hrldcpr commented 2 years ago

(OrderedPMap is in! #102 )