Reorganize assertions, defines, and relationship between JDK's and fastutil's primitive-based functional interfaces

vigna commented 3 years ago

This is an umbrella issue for discussing two problems that are becoming very visible in these days in the codebase:

The overlap of different people implementing different conditional features has led to a proliferation of complex assertions—the semantics of which, however, is sometime much simpler than one would expect (e.g., it's just a matter of being int/long/double or any other type). All contributors added stuff to gencsources.sh, resulting in a large collection of #define's without any comment. Ten years ago that would have been OK, but with the amount of subtly convoluted conditional compilation we have now this is no longer acceptable. It has become very difficult (for me) to understand which code is being generated. There must be some documentation for the #define's, and clearly named assertions.
The strategy for implementing functional interfaces or callers of functional interfaces in the presence of multiple possible implementations (e.g., object-based, JDK-primitive, or fastutil-primitive) is not uniform throughout the codebase. I find myself thinking most of the time "shouldn't this be implemented the other way around?". This is not healthy (oh, of course maybe I'm just old). Once again, ten years ago there were so few cases that looking around in other parts of the code was sufficient to understand what was the best practice, but this is no longer the case.

For the first part, it is mostly a matter of restructuring the file and commenting.

For the second part, I must write somewhere what is the preferred design. I'll keep updating this text, and I invite everybody working on fastutil to comment below.

Now, given the premises, there are things we cannot change:

We want to benefit from code improvements upstream. This means that anything primitive that has been implemented in the JDK must not be overridden. For example, IntIerator.forEachRemaining() has primitive implementations in the JDK, so we should use those (possibly all of them—also the ones accepting object-based consumers).
Lambdas will be fastutil-primitive. This is unavoidable because fastutil's types join the object and primitive JDK world, and this is the only possibility to avoid ambiguity while continuing fastutil's tradition of adding primitive-based operation without eliminating the object-based ones. As already implemented by @techsy730 in a number of cases, for methods accepting functional interfaces there must the usual deprecated object-based one, a fastutil-primitive one, and a JDK-primitive one if it is possible. We have by now functions (unary operators), predicates, consumers and binary operators, which should cover everything type-specific presently (except of course for two-argument functions in which the three types involved are not all the same, as in Map.computeIf*()).

The solution chosen is as follows (at present time, for Consumer/Predicate-related methods):

in the int/long/double case, we inherit from the JDK or implement the overload accepting a JDK type-specific functor. An @apiNote will explain that this is the method to override.
in the remaining case, the overload accepting a fastutil type-specific functor will be implemented, instead. Again, an @apiNote will explain that this is the method to override.
in the int/long/double case, the overload accepting a fastutil type-specific functor delegates to the overload accepting a JDK type-specific functor. A WARNING will explain that this method should not be overridden.
in the remaining cases, the overload accepting a widened JDK type-specific functor checks for the functor being an instance of fastutil type-specific functor, in which cases passes directly the functor with a cast, or generates a new lambda using the only method of the functor. An @apiNote that will explain that this method incurs in a slight overhead.

To make the implementation easy, the functor type associated with the actual implementation is stored in a macro, so the implementation of the first two cases is the same. The other implementations are contained within conditional code, which makes it possible to add appropriate comments. See Consumer.drv for an example.

Another issue is whether to use the JDK-primitive method names, if available, instead of fastutil's names. In principle I'm against this option because fastutil has always avoided adding disambiguating strings to methods if the type of the arguments was sufficient. So a BinaryOperator should have an apply() method. The applyAsInt() coming from the JDK should be delegated with a default method to apply(). This delegation would actually never happen inside fastutil at runtime if we proceed as explained above, because a ready-made JDK-primitive thing would be turned into a new lambda in which the JDK-primitive method is used to implement the fastutil-primitive method.

techsy730 commented 3 years ago

One low hanging fruit is the #if defined JDK_PRIMITIVE_CONSUMER tests This is identical to #if KEYS_PRIMITIVE && !KEY_WIDENED && , which is much easier to read.

Same with ..._PREDICATE

vigna commented 3 years ago

Speaking of that, I think we should move from ! KEY_WIDENED and similar to something like KEY_INT_LONG_DOUBLE. It can be reused more easily.

vigna commented 3 years ago

BTW, any comments on the other points? For example, removeIf() and forEach() have implementations with opposite design criteria. Above I'm favoring the removeIf() design.

techsy730 commented 3 years ago

Oh yeah, I see that I put the widened case first for removeIf but the non-widened case first for forEach. That should be easy enough to flip around. Also, I see I took advantage of METHOD_ARG_KEY_CONSUMER for the forEach case but didn't take advantage of METHOD_ARG_PREDICATE in removeIf. I can save one method defined in an #if block if I do that.

techsy730 commented 3 years ago

As for the macros, here is what I am thinking.

Add KEY_INT_LONG_DOUBLE.
Replace uses of KEY_WIDENED with KEYS_PRIMITIVE && !KEY_INT_LONG_DOUBLE (this can likely be simplified to !KEY_INT_LONG_DOUBLE in a lot of places as many uses are already in a #if KEYS_PRIMITIVE block)
Replace ! KEY_WIDENED with KEY_INT_LONG_DOUBLE.
Remove KEY_WIDENED

Similar for VALUE.

Add some documentation and comments somewhere saying that the only macros you should use in conditionals in .drv files are

KEYS_PRIMITIVE and KEYS_REFERENCE
KEY_INT_LONG_DOUBLE
KEY_CLASS_<type>
SMALL_TYPES (or any other macro intended for build time feature selection)
Any VALUE version of the above
Any logical combination of the above (and, or, not), though try to keep it as simple as possible if you do that

Type declarations would not be subject to this; they are the whole reason we define macros after all. Only #if statements.

Then go through all existing conditionals and update them to comply with this (there isn't that many that need to be updated from the looks of it)

Establish a standard order for #if/#else binary options such as

KEYS_PRIMITIVE or KEYS_REFERENCE
KEY_INT_LONG_DOUBLE or !KEY_INT_LONG_DOUBLE and update all existing cases to use that order.

Go through gencsource.sh and add a comment documenting every macro defined that is any more complicated then

#if KEYS_PRIMITIVE (or KEYS_REFERENCE)
#define Option1
#else
#define Option2
#endif

which describes its intended use case, and usual values for the cases of a non-widened primitive, widened primitive, boolean, and reference type.

techsy730 commented 3 years ago

I know that seems like a lot of refactoring, and while it certainly would not be trivial, the macro uses that don't follow the above are the exception instead of the rule.

vigna commented 3 years ago

Oh yeah, I see that I put the widened case first for removeIf but the non-widened case first for forEach. That should be easy enough to flip around.

OK, that is good but do you have an opinion about which one should be flipped? This is my point (at least, the point of this issue). I can play with JMH but you if you have intel about that it would be great.

Also, I see I took advantage of METHOD_ARG_KEY_CONSUMER for the forEach case but didn't take advantage of METHOD_ARG_PREDICATE in removeIf. I can save one method defined in an #if block if I do that.

Let me have a look at that...

vigna commented 3 years ago

Note that, for example, in Iterator.forEachRemaining() a non-specific Consumer is turned into a lambda by casting it into a fastutil's IntConsumer, which is passed to a method that casts it to a JDK's IntConsumer. There's no need for that—one can cast to JDK's IntConsumer directly and save a delegation. It is this sort of stuff that makes me feel I'm walking on eggs.

UPDATE: wrong, it's a single method call, so the compiler chooses the most-specific type. The second cast has no effect—it can be to fastutil's or JDK's operator.

vigna commented 3 years ago

OK, that is good but do you have an opinion about which one should be flipped? This is my point (at least, the point of this issue). I can play with JMH but you if you have intel about that it would be great.

OK, after some playing with javap it looks like generating lambda by casting a method is quite expensive (there's bootstrap code, an invokedynamic instruction etc. exactly as when creating a new lambda). So that should not be in the preferred execution path.

vigna commented 3 years ago

All in all, the way I see things now, int/long/double implementations should implement the call using the JDK primitive types, or inherit from the JDK; fastutil's version should be a delegation with cast, which we hope will be inlined by the JIT (this is of course unfortunate, as it means that lambdas will go through the delegation, but for iterative methods this should not be a problem; for non-iterative methods we might consider some code duplication). All other types should implement fastutil's version.

For what matters the trivial extensions to JDK's primitive consumers and predicates by type widening, we should do the opposite: the JDK-based implementation should delegate by casting appropriately the accept method. This makes it possible to use sensible ready-made JDK-primitive consumers (e.g., computing statistics). It does not seem a good idea to follow the practice of the int/long/double case because it would mean that lambdas would undergo widening casts at each call.

In particular, presently both fastutil Predicate and Consumer fail to have a disambiguation method in the int/long/double case. It is not possible to use lambdas with or(), and(), andThen(), etc.

vigna commented 3 years ago

I have committed a new version of Consumer that tries to follows the guidelines above, in particular it uses only KEYS_INT_LONG_DOUBLE for its conditional code: 4f97e83a1

I am not 100% happy because the opposite treatment of int/long/double and the rest gives rise to not-so-nice conditional code, but it seems to be reasonable.

I have de-deprecated andThen() with a JDK primitive operator because while it's slower, it is the only possibility for concatenating such an operator. I added an @implNote explaining the issue. While using apply(int) on a CharConsumer is obviously wrong, concatenating a ready-made JDK IntConsumer should be OK.

UPDATE: I have also updated Consumer. I realized that the small code duplication is better than the crazy conditionals.

techsy730 commented 3 years ago

Oh sorry, I didn't mean to dump all of the above ideas on you. I was going to take on some of this. Just my main job had stuff come up.

I'm taking a look at removeIf and forEach now. I'll update them to comply with the best practices above too.

vigna commented 3 years ago

No no, I like people dumping ideas and discussing. It is always a bit complicated to take decisions in fastutil because the ramifications of the consequences are never immediately evident. So discussion is fundamental.

vigna commented 3 years ago

OK, great, I'd say I'm quite satisfied with these. Now we have to carefully go through all methods invoking a Consumer or a Predicate and check that they follow the same pattern. But, tomorrow :).

vigna commented 3 years ago

So, if I understand correctly AbstractIterator has a new life with the purpose of deriving from an abstract class that makes the "wrong" forEach() final. If we wanna keep that as it is the class must be de-deprecated and the comments removed, as now it does not make much sense.

The other observation is: do we want to finalize also the other "wrong" method? Presently we finalize only in the int/long/double case the method that delegates by casting. Shouldn't we also finalize the method that in the non-int/long/double case creates a lambda? That would be also a source of wrong overrides.

vigna commented 3 years ago

Also: why we cannot get rid of the shim class and put the finalize method in AbstractIterator? It seems to me it would work in the same way.

vigna commented 3 years ago

I want through Iterators.drv and I think I refactored correctly all #define's. However, I'd really need you (@techsy730 ) to have a look.

It seems to me that in a few cases there are nested #if's that will always be evaluated in the same way (e.g., see line 701). I'm a bit afraid of touching that tho.

Going through the refactoring, I ended up writing a lot of

#if KEYS_PRIMITIVE && ! KEYS_INT_LONG_DOUBLE && ! KEY_CLASS_Boolean

which is exactly KEY_WIDENED. I'm wondering whether to have a KEYS_BYTE_SHORT_CHAR_FLOAT replacing KEY_WIDENED.

vigna commented 3 years ago

Another question: Lists.drv contains code that is conditional on SMALL_TYPES. This is a problem if we want to split the distribution. One thing is to put different classes in different places, another thing having different conditional code. Also, I cannot understand why it is useful—if SMALL_TYPES is not true I guess the class is not even generated.

vigna commented 3 years ago

@techsy730 , I almost completely got rid of KEY_WIDENED. There are a couple of TODOs in the .drv file (like "check these overloads) that I'm not completely convinced of (there seems to be some code duplications). Please also look for the few remaining tests on KEY_WIDENED.

The problem is that if I botched the logic and I'm just missing some override that increases efficiency we'll have not way to know. :(

techsy730 commented 3 years ago

Haven't finished reviewing yet, but I see you added a KEYS_BYTE_CHAR_SHORT_FLOAT. Cool.

Would you like me to replace my uses of

#if KEYS_INT_LONG_DOUBLE
// ...stuff
#elif !KEY_CLASS_Boolean
// ...widened stuff
#endif

with

#if KEYS_INT_LONG_DOUBLE
// ...stuff
#elif KEYS_BYTE_CHAR_SHORT_FLOAT
// ...widened stuff
#endif

?

And similarly replace #if KEYS_PRIMITIVE && !KEYS_INT_LONG_DOUBLE && !KEY_CLASS_Boolean with #if KEYS_BYTE_CHAR_SHORT_FLOAT

vigna commented 3 years ago

Would you like me to replace my uses of

Well, it might be a good idea. It is a known fact that negative conditions are always less understandable than positive conditions. So whenever we can give a positive test, it's better.

vigna commented 3 years ago

@techsy730 , a gentle ping for the questions above...

vigna commented 3 years ago

There's also a non-type-specific replaceAll() in empty type-specific lists that is not deprecated (so it generates warnings) and that does not throw an UnsupportedOperationException(). Is there a reason for that? I thought we were using the "syntactic convention" (all mutation methods cause exceptions, even if they wouldn't change the data structure).

techsy730 commented 3 years ago

Oh sorry for missing the questions.

I reviewed (and resolved any issues) with your macro in one of the PRs.

Yes, SMALL_TYPES is safe to remove for all but widening cases (e.g. IntIterators having a method wrap that takes a ShortIterator and widens it to an IntIterator). I considered having these methods in ShortIterator instead as that is already conditionally compiled based on SMALL_TYPES but I think it caused the code gen to be more gnarly. But I can't remember; I may try to move them and see what that does, assuming it isn't already released API (if it is, we will have to live with it for at least one version for deprecation, assuming we do go forward to that).

We should probably be consistent and deprecate the Object based replaceAlls in primitive collections to be consistent.

For the shim classes; that was more me wanting to hide non-meaningful API details in a non-public class (not to mention a non-deprecated one). It would work just as fine in AbstractIterator. But we would have to keep it for Spliterator as there is no AbstractSpliterator. Not to mention not all of our spliterators extend the AbstractIndexBased ones (though I suppose those are harmless as they aren't public classes; users can't override the "wrong" methods anyways).

We could mark final the lambda generating object based ones in the primitive classes. I agree there is very little reason for a user class to override them. But I didn't do so at the time as they will already get yelled at because it is a deprecated method and won't accidentally be triggered by a import it.unimi.dsi.fastutil.ints.* like disambiguation one does.

vigna commented 3 years ago

They are not deprecated—there's just a note explaining that it would be better to use a type-specific method. So it would be better, in the interest of coherence, to keep the user, when possible, from overriding that method.

BTW, replaceAll() in the case of widening/narrowing copies the code instead of invoking the standard version with a new lambda. This can cause some problems if the user overrides the main method for some reason. Is there any particular reason not to generate a lambda and pass it to the main overridible method?

vigna commented 3 years ago

Oh sorry for missing the questions.

I reviewed (and resolved any issues) with your macro in one of the PRs.

Yes, SMALL_TYPES is safe to remove for all but widening cases (e.g. IntIterators having a method wrap that takes a ShortIterator and widens it to an IntIterator). I considered having these methods in ShortIterator instead as that is already conditionally compiled based on SMALL_TYPES but I think it caused the code gen to be more gnarly. But I can't remember; I may try to move them and see what that does, assuming it isn't already released API (if it is, we will have to live with it for at least one version for deprecation, assuming we do go forward to that).

SMALL_TYPES has always been a kind of underspecified thing, so if you want to play with it and get again a working version that'd be great. Then we can try to make two distinct jars. But, the code must be the same in both versions.

We should probably be consistent and deprecate the Object based replaceAlls in primitive collections to be consistent.

Yes, that's how it is done for all object-based methods. Would you mind taking care of that?

For the shim classes; that was more me wanting to hide non-meaningful API details in a non-public class (not to mention a non-deprecated one). It would work just as fine in AbstractIterator. But we would have to keep it for Spliterator as there is no AbstractSpliterator. Not to mention not all of our spliterators extend the AbstractIndexBased ones (though I suppose those are harmless as they aren't public classes; users can't override the "wrong" methods anyways).

I'll move it into AbstractIterator and create an AbstractSpliterator. In this way people will use them and get the finalized methods.

We could mark final the lambda generating object based ones in the primitive classes. I agree there is very little reason for a user class to override them. But I didn't do so at the time as they will already get yelled at because it is a deprecated method and won't accidentally be triggered by a import it.unimi.dsi.fastutil.ints.* like disambiguation one does.

OK, I'll open a separate issue about this as it's complicated.

vigna commented 3 years ago

@techsy730 I still see #if KEY_WIDENED in Iterator(s).drv, can you have a look? After that we can get rid of it.

vigna commented 3 years ago

Iterators.any(), all(), indexOf() lack any documentation and they do not appear to have been inspired by the JDK, so we can't refer or inherit documentation. Right?

vigna commented 3 years ago

@techsy730 still a couple of questions left: replaceAll() implementation (line 301 of List.drv) and any()/all()/indexOf() documentation.

techsy730 commented 3 years ago

I am on-call for work this week so I have been busier then normal. I'll document the any, all, and indexOf methods of iterators (though didn't those already exist before my changes?) And yeah, it probably would be cleaner to go ahead and make the non-type specific replaceAll throw UnsupportedOperationException directly instead of indirectly by forwarding it to a method that does.

vigna commented 3 years ago

I am on-call for work this week so I have been busier then normal.

There's no hurry, I made you wait 18 months 😂

I'll document the any, all, and indexOf methods of iterators (though didn't those already exist before my changes?)

git blame claims it is all your work. Unless you just moved them around or something.

And yeah, it probably would be cleaner to go ahead and make the non-type specific replaceAll throw UnsupportedOperationException directly instead of indirectly by forwarding it to a method that does.

That I already did. My question is about line 301 of List.drv. Instead of delegating a new lambda, there's a copy of the main method body. I think this is not good because if the user improves the main method this method won't get the improvements. I think we should delegate as usually with a new lambda, but I was wondering if there was a specific reason for the different implementation here.

techsy730 commented 3 years ago

Are you talking about

    default void replaceAll(final JDK_PRIMITIVE_UNARY_OPERATOR operator) {
        replaceAll(x -> KEY_NARROWING(operator.JDK_PRIMITIVE_KEY_APPLY(x)));
    }

?

That is making a new lambda. It is a lambda converting an IntUnaryOperator to ByteUnaryOperator (or similar), and then passing that to the main replaceAll method.

vigna commented 3 years ago

I just changed it. Look at it before the last commit. I think it is correct but I'd like your OK.

vigna / fastutil

Reorganize assertions, defines, and relationship between JDK's and fastutil's primitive-based functional interfaces #201