vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.81k stars 199 forks source link

Introduced `MovableBidirectionalIterator` #283

Open catap opened 1 year ago

catap commented 1 year ago

This is a memory optimization for the case when a user required to modify iterator by moving it to a desire position. This works the same way as iteraotr(fromElement) but doesn't create a new iterator that decreases memory footprint at an algorithms which makes a lot of jumps.

catap commented 1 year ago

Just pushed a trivial unit test that's proof that it works as expected. I've also fixed formating by using tab, instead of space which makes generated code looks nice.

catap commented 1 year ago

And some wording for the doc that makes it a bit cleaner.

catap commented 1 year ago

Realized that I've missed SubsetIterator. Fixed.

catap commented 1 year ago

...and adding the same jump to *RBTreeMap's and *AVLTreeMap's both sets: keySet and entrySet.

catap commented 1 year ago

I hope that I've added it at all sorted Sets now.

catap commented 1 year ago

Added dummy implementation to empty sets

catap commented 1 year ago

@vigna any thoughts?

vigna commented 1 year ago

Sorry, I somehow lost track of this. The problem I see is that it does not make any sense at the interface level—iterators do not necessarily come from sorted collections.

catap commented 1 year ago

Sorry, I somehow lost track of this. The problem I see is that it does not make any sense at the interface level—iterators do not necessarily come from sorted collections.

yep, and I haven't found SortedIterator or something like that.

Anyway, it is quite useful at my case :) and I'll be very appricited if it can be merged.

vigna commented 1 year ago

On 24 Mar 2023, at 14:03, Kirill A. Korinsky @.***> wrote:

Anyway, it is quite useful at my case :) and I'll be very appricited if it can be merged.

I would probably merge in the implementations, but not the interface (as I said, it's not really sensible).

Ciao,

                                                     seba
catap commented 1 year ago

I see your point and I agree with it, but I haven't found any nice way to inject that jump. Introducing a conception of JumpingIterator seems like an overkill.

catap commented 1 year ago

@vigna I'm willing to update this PR to make it easy to merge. To do it, may I ask some guidlines? Thanks!

vigna commented 1 year ago

I really don't know. If this would be in Rust everything would be easier as we could combine traits. In the present situation I don't see alternatives to a JumpableIterator-or-something interface. I'm also not entirely convinced of the word "jump"—it suggest moving forward in the iterator sequence, but you're actually implementing a "move". In which case I'd probably have a "rewind()" method (move to the first item, like creating a new iterator) and a "from" method. That is, you can do after creation what you'd do at creation.

Another problem is even if we go for the interface, there's no obvious way to have default implementations, and definitely I want to avoid other optional methods. Any thoughts?

All this aside, do you have any evidence that in your case you have a measurable impact on performance? Small objects that are quickly used and discarded have almost zero impact with modern GCs.

catap commented 1 year ago

Let me start from the side note. I do have and this was made to make one services quite happy :) Long story short: it decreased response time about 2x in general. And for this services response time is one of critical things :)

The usage of this feature is covered by micro benchmark, and here an example of JMH output with jump:

Benchmark                 (fields)  (ors)  (queries)  Mode  Cnt     Score   Error   Units
both                             5      5          5  avgt   12    65.028 ± 1.266   us/op
both:·gc.alloc.rate              5      5          5  avgt   12    34.850 ± 0.690  MB/sec
both:·gc.alloc.rate.norm         5      5          5  avgt   12  2376.015 ± 0.002    B/op
both:·gc.count                   5      5          5  avgt   12     3.000          counts
both:·gc.time                    5      5          5  avgt   12     5.000              ms

and when I've replaced jump into recreating a brand new iterator:

Benchmark                 (fields)  (ors)  (queries)  Mode  Cnt     Score   Error   Units
both                             5      5          5  avgt   12    64.638 ± 1.337   us/op
both:·gc.alloc.rate              5      5          5  avgt   12   108.726 ± 2.228  MB/sec
both:·gc.alloc.rate.norm         5      5          5  avgt   12  7368.014 ± 0.002    B/op
both:·gc.count                   5      5          5  avgt   12    11.000          counts
both:·gc.time                    5      5          5  avgt   12    20.000              ms

As you may see impact into memory footprint is dramatical, and into speed is insignificant. So, as you had say it hasn't got anything in performance and mainly with GC.

This micro benchmark is runs against faaaar less collections that a production one and production has impact more considerable to be honest, plus it affect in number and period of GC cycles...

The services is running with Shenandoah as GC, I assume it modern enough :) and still less objects make things better and save heap for something else quit useful.

Regarding your approach => I'll rename everything, and try to create a dedicated iterator, maybe it won't be as ugly as I though. Let see.

catap commented 1 year ago

@vigna do you prefer one more commit with rename and new iterator, or force push with rebased commit?

catap commented 1 year ago

here it is.

I've introduced MovableBidirectionalIterator which contains two functions: move(fromElement) and rewind().

It was cleaner that I've expected to be honest.

vigna commented 1 year ago

Thinking about it. We're talking about bidirectional iterators. If we have rewind(), we need also something moving to the end. Or not? Just thinking. For example, if we're gonna implement those for linked hash maps, you could get to the start or end in constant time. Maybe begin() and end()? I can't think of an "end" version of rewind().

catap commented 1 year ago

I agree about moving to the end, but I was blocked because I can’t find the right naming :)

begin() and end() seems like a position and not a verb.

catap commented 1 year ago

@vigna after some thoughts I agree that begin(), move(..) and end() is the best naming. It already has next() and previous(), so, my assumption that it isn't clear that it is a verb won't hold.

I've added support end() also, and heavy reworked unit tests.

This is ready for review.

vigna commented 1 year ago

On 17 Apr 2023, at 15:44, Kirill A. Korinsky @.***> wrote:

And I do have one more "strange" idea on this iterator: mark() and reset() with behaviour similar to same function at InputStream. What do you think about it? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

It seems to put too much strain on implementors. No?

Ciao,

                                                     seba
catap commented 1 year ago

@vigna yes, after a bit thinking of that mark-reset I've dismissed an idea and removed commit :) because it too wired

catap commented 1 year ago

@vigna any thought on that?

vigna commented 1 year ago

Really on vacation now :). Ferragosto in Italy is sacred. Let's talk about this mid-September...

catap commented 1 year ago

@vigna how does vacation goes? :)

catap commented 1 year ago

@vigna just rebased it to the last master

vigna commented 11 months ago

Ok, I know it's like forever but I'm looking into this. There should be no backward compatibility problem because you have default methods right? Just looking around.

catap commented 11 months ago

@vigna yes, no backward compatibility issues as far as I see. The default method which triggers an exception for this.

vigna commented 11 months ago

@vigna yes, no backward compatibility issues as far as I see. The default method which triggers an exception for this.

What?

catap commented 11 months ago

@vigna yes, no backward compatibility issues as far as I see. The default method which triggers an exception for this.

What?

I mean that if someone call new method and hadn't got implemented it, an exception will be thrown.