vigna / fastutil

fastutil extends the Javaâ„¢ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.8k stars 197 forks source link

Clean up some of the new "of()" static factory methods. #177

Closed techsy730 closed 3 years ago

techsy730 commented 3 years ago

Clean up the of methods()

Because of this validation, it is now worth it to have overloads of 0 and 1 element, which skip the temporary array and validation (0 and 1 element sets can never have duplication).

In the spirit of ArraySet's "just trust me" API of working raw arrays unvalidated, an it has an offshoot ofUnchecked that takes the elements as is and does no validation, like the array accepting constructors do.

vigna commented 3 years ago

Oh I didn't know. No, I would prefer to follow the platform behavior... since you're at it, would you fix it? Thanks!

techsy730 commented 3 years ago

Sure. I can fix it.

But I was thinking it would be nice to have a version of of(...) that returns an unmodifiable List/Set. That way if the size is 0 or 1, the empty or singleton version can be returned, and we can potentially swap implementations based on the number of elements.

Should that live under "List" or "Lists"? And should that be a different PR?

vigna commented 3 years ago

I have added indeed to type-specific List and Set an .of() method, presently delegating to the array-base and hash-based implementations. Is the Java version specified as immutable?

techsy730 commented 3 years ago

Yes, the versions of "of" that lives in the root collection interfaces are specified as immutable. https://docs.oracle.com/javase/9/docs/api/java/util/Set.html#of--

I think having mutable versions of these in the mutable subclasses is a fine API though. But we may want an immutable version too that behaves like the platform's version.

vigna commented 3 years ago

Well, then I'd say that's very reasonable to have an implementation .of() method that just return the implementation, and an interface .of method that returns an optimized potentially immutable result. I'm not sure I want to guarantee immutability because that will make the implementation slower. Unless we subclass OpenHashSet, say, and override the mutation methods.

What do you think?

vigna commented 3 years ago

OMG they have specialized instances up to 10 elements. Just to avoid allocation of an array, I guess...

incaseoftrouble commented 3 years ago

I think guaranteeing immutability of the result is good. This guarantees that you don't have to do defensive copies / wrapping (trading a bit of speed for good design paradigms).

OMG they have specialized instances up to 10 elements. Just to avoid allocation of an array, I guess...

And to allow for specialized implementations I guess. Maybe up to x elements it makes sense to implement contains etc. via if slides.

Actually, with ints, you could implement such immutable sets / maps with switch statements. These should be stupid fast.

vigna commented 3 years ago

switch statements? Can you elaborate on that?

incaseoftrouble commented 3 years ago

switch statements? Can you elaborate on that?

oops my bad, switch branches of course have to be compile time constants 🤦 shouldn't write things without thinking.

techsy730 commented 3 years ago

Ok, I have cleaned up the root interface of methods. Also provided some overloads to avoid temporary array creation (but only up to 3).

Right now they do return immutable lists/sets, but besides the singleton and empty case they are mere normal implementations wrapped in the type specific unmodifiable. A little bit of overhead yes, but any JVM worth anything should be able to optimize that away.

vigna commented 3 years ago

This seems to be ready to merge for me. Comments?

incaseoftrouble commented 3 years ago

This seems to be ready to merge for me. Comments?

I added some, only minor stuff / naming convention.

I'm not so sure about the value of the mutable of methods with arguments. @techsy730 Do you have a particular use case in mind for them? IMO a simple .create() or .of() is sufficient (maybe rather give those methods parameters like expectedSize etc., similar to Guava; See for example HashBasedTable's three create variants)

techsy730 commented 3 years ago

create for mutable return types and of for immutable sounds like a reasonable change to me.

As for why we have argument overloads in addition to the varargs one. It is to avoid creating an array for cases that don't need an array. For example, if we didn't have no-arg overload, then List.of() and OpenHashSet.create() called with no arguments would create a new empty array each call that is going to be thrown away.

vigna commented 3 years ago

Mmmh. No, I like .of() everywhere—my idea was that .of() in an implementation gives you an object of that implementation, .of() in the interface gives you an immutable implementation.

I'm thinking of splitting the methods of OpenHashSet, adding an intermediate ImmutableOpenHashSet class that might be useful for these implementations. It's just a matter of moving the mutation methods and probably copy-and-paste some code for the iterator.

vigna commented 3 years ago

BTW, interesting comment in Guava:

 @SafeVarargs // For Eclipse. For internal javac we have disabled this pointless type of warning.
incaseoftrouble commented 3 years ago

Mmmh. No, I like .of() everywhere—my idea was that .of() in an implementation gives you an object of that implementation, .of() in the interface gives you an immutable implementation.

I see. Fair enough. I'm just afraid that this might cause confusion because of name clashing.

I'm thinking of splitting the methods of OpenHashSet, adding an intermediate ImmutableOpenHashSet class that might be useful for these implementations. It's just a matter of moving the mutation methods and probably copy-and-paste some code for the iterator.

If we are doing this: Maybe aggregate the methods in a hidden "AbstractOpenHashSet" and add a public "ImmutableOpenHashSet" subclass as sibling of "OpenHashSet". Intermediate might be dangerous since then if (set instanceof ImmutableOpenHashSet) doesn't really do what it should. Additionally, the public immutables could have a shared ImmutableCollection marker interface, so if you have something generic like IntCollection you could check instanceof ImmutableCollection.

As for why we have argument overloads in addition to the varargs one. It is to avoid creating an array for cases that don't need an array. For example, if we didn't have no-arg overload, then List.of() and OpenHashSet.create() called with no arguments would create a new empty array each call that is going to be thrown away.

I'm aware of that, I think you misunderstood me. My comment was only about the naming, not that there are "empty" factory methods.

vigna commented 3 years ago

If we are doing this: Maybe aggregate the methods in a hidden "AbstractOpenHashSet" and add a public "ImmutableOpenHashSet" subclass as sibling of "OpenHashSet". Intermediate might be dangerous since then if (set instanceof ImmutableOpenHashSet) doesn't really do what it should. Additionally, the public immutables could have a shared ImmutableCollection marker interface, so if you have something generic like IntCollection you could check instanceof ImmutableCollection.

Yes, I was unclear but that was the idea. Also because I'd like to cache the hash code, as in Guava, so that OpenHahSets of immutable sets would have much faster access. That could be useful in a number of situations.

vigna commented 3 years ago

But I think all this should follow a merge with #120 , or we will get in a mess of conflicts.

techsy730 commented 3 years ago

Sounds good. And I need to also do a similar thing to BigList, BigArrayBigList, and OpenHashBigSet anyways (We don't have a BigSet interface, so I am wondering where we are going to put the immutable big set versions).

vigna commented 3 years ago

Well, a big set is just a big implementation of a set. Except for the unfortunate size() method returning an int, there's no method in Set that doesn't apply to big sets. So an OpenHashBitSet.of() method might be useful in weird circumstances, but definitely we do not need methods returning an immutable big set.

vigna commented 3 years ago

When are positioned now with this? Should I merge or there is something else you want to add?

techsy730 commented 3 years ago

Ideally I would like to add such methods to BigList and BigArrayBigList as well. I do need to clean up the @SafeVarargs stuff and documentation, bring it into line with Guava's policy. Or perhaps not use the array given as the backing array (clone it first, and ensure it is Object[] exactly for the object based versions), and use the array given as is in some wrap method (which some of them do, this would apply this convention consistently)

vigna commented 3 years ago

Yes, I would avoid copying as much as possible. What is Guava's policy? I see comments about "useless annotation".

vigna commented 3 years ago

OK, there are now type-specific ImmutableList types (just a backing array and access method). I had to do a little bit of code duplication, but, still, very little. I do not think it makes much sense to have an ImmutableBigList type.

I'm merging this and replace in List.of() the new immutable stuff. Apart for clarifying the @SafeVarargs stuff everything looks fine to me.