vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.78k stars 196 forks source link

Finalize as many delegating methods as possible (maybe) #214

Closed vigna closed 3 years ago

vigna commented 3 years ago

With the design currently taking shape, every method that can accept a JDK primitive version (possibly widened/narrowed) has three overloads:

The object-based versions are deprecated, so overriding is unlikely. Presently, we warn about overriding and we try to prevent it by finalized delegating methods only for the secondary overload in the first case. Maybe we should do it (both warning in the Javadoc and finalization) also in the second case.

NOTE: the delegating method (first case, second overload) need not to refer explicitly to a class. It is sufficient to delegate with a cast to the JDK primitive version. Referring explicitly to a class prevents from delegating from two or more layers up.

However, maybe a warning in the documentation is enough. While in the first case there's really no imaginable reason for overriding the method, in the second case there might be efficiency reasons to reimplement it.

The situation is different and much more complicated for the Map compute* methods and will have to be handled separately.

techsy730 commented 3 years ago

Do you want me to take care of this, or are you already working on it?

I also think things like int/long/doubleIterator should be made final as well, as by spec these are supposed to be the same as just regular iterator for non widened types.

vigna commented 3 years ago

Do you want me to take care of this, or are you already working on it?

Well, it's open to discussion—that's why there's a "maybe". Do we have use cases in which it is sensible to override the second case?

I also think things like int/long/doubleIterator should be made final as well, as by spec these are supposed to be the same as just regular iterator for non widened types.

I don't understand what you're saying here. Make final what exactly?

incaseoftrouble commented 3 years ago

In general, I am in favour of not finalizing anything except when its really necessary (and even then, I'd think twice about it, more often than not I've found myself wanting to tweak some stuff but couldn't because of private / final things, such as, e.g., an internal backing array not being accessible). IDEs show when you are overriding a deprecated method, so you typically won't do this by accident. A possible reason to override would be, for example, to add logging (yes, one could also do that with a debugger but maybe you want to run it in production to trace something or whatever).

Usually, the "default" should be defensive (final / private), however, IMO data structures should be open / extensible / hackable. Here, it shouldn't be "do we find a reason to leave this accessible / non-final?" but rather "is there really a reason to make it private / final?". For me, another argument is that (typically) users will extend data structures only if they want to add a specific behaviour, so they have to familiarize themselves with the details anyway.

vigna commented 3 years ago

The problem is that we do not want to deprecate the method that creates a lambda, as it is reasonable in some cases. This means that people could override it by mistake. The idea is that there is a single method actually implemented: the others delegate by a cast or a lambda. It is true that the lambda thing might be implemented more efficiently, so probably we should not override that—just putting a big alert. The method delegating by casting can be safely finalized IMHO (as we're trying to do now).

incaseoftrouble commented 3 years ago

I see your point. As I said, if the answer to "is there really a reason to make it private / final?" is yes, then it surely makes sense. Its just that in this particular case (in contrast to usual practice) we shouldn't make things final just because we can IMO.

vigna commented 3 years ago

I agree. There's also a second problem that makes me dubious about the whole finalization thing. In this way, the method gets two delegations—one from the finalized method to the default method in the superclass (or upper in the hierarchy), and then the delegation by casting. While it is possible that JITs will inline all of this, we are making just an educated guess, and we are talking about the method that will be bound to lambdas. For iterating methods such as forEach() that might be not so bad, but computeIfAbsent(), for example, would definitely not benefit from a double layer of delegation, and I'd like to have a uniform policy.

There's also a danger in having some of these methods finalized, and some not, depending on the existence of abstract classes in the middle, of efficiency. I might create a false sense of security (as in "if I can override it, its' good").

Maybe it is better to have a clear @apiNote explaining that that method should not be overridden. We have already @apiNote tags explaining which methods should be overridden, so that would be a kind of symmetric treatment. The methods accepting objects are deprecated as always and do not pose a threat.

techsy730 commented 3 years ago

The reason I made the forEach(\<fastutil PrimitiveConsumer>) final is because it would be so easy for subclasses to get the wrong one.

For example, in this very realistic example.

import it.unimi.dsi.fastutil.ints.*;

class MyIntList extends AbstractIntList {
  // ...
  @Override
  public forEach(IntConsumer i) {
    // A cool forEach that probably won't get used because this is the wrong overload; you need the JDK IntConsumer
    // but the import makes it the fastutil one.
  }

  // It is even more insidious with forEachRemaining in Iterators and Spliterators, which streams will use.
}

Because it is that trivial to get the wrong one, I decided to make it very obvious if you do this (assuming you use the abstract collection based ones). The others I think are fine to leave as an @apiNote, but this one in particular is very easy to get wrong from a simple * import, and lose a lot of potential performance.

vigna commented 3 years ago

Yeah, I agree on this particular case (int/long/double functors with fastutil-primitive arguments). Since this is the "natural" method for fastutil, people will tend to override that. The confusion is augmented by the fact that the idea is perfectly correct for all other types. So yes, an @apiNote in that case is probably not good enough (yet, we need an @apiNote for uniformity and for handling cases in which finalization is not possible).

So I'd say let's finalize those as much as possible, and put the standard @apiNote we are putting everywhere else (widening/narrowing methods).

Probably the whole "computeIf" system needs a different treatment, because they are methods with a different semantics. I don't want users to get confused with that. So I'll add an @apiNote explaining that there is no delegation.