kucci / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Sets.transform(Set<F>, Converter<F, T>) or the equivalent - WontFix: Use ImmutableSet.copyOf(Collections2.transform(...)) #219

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'd like to use method: 
Sets.transform(Set<F> fromSet, Function<? super F,? extends T> function)

But there is no such method. I would expect it would behave the way similar
to one in List.transform()

thanks.

Peter B.

Original issue reported on code.google.com by butko...@gmail.com on 13 Aug 2009 at 11:52

GoogleCodeExporter commented 9 years ago
How would you implement Set<T>.contains(Object) in the resulting set? (Or if 
that set 
was not lazy, you wouldn't need this method). Well, you would need the inverse 
function 
T --> F as well (which would imply that such an inverse function *exists*). So 
it's 
lost more complicated than that, and I'm not sure that's the only issue.

Original comment by jim.andreou on 13 Aug 2009 at 11:59

GoogleCodeExporter commented 9 years ago
I didn't notice it's implemented lazy for list, sorry, only now I checked the 
sources.
OK, that's probably too much complicated then.

Original comment by butko...@gmail.com on 13 Aug 2009 at 12:11

GoogleCodeExporter commented 9 years ago
Sets.transform isn't feasible. The problem is that the function could map two 
distinct 
elements in fromSet to the same value, which would violate the requirement that 
all 
elements in the returned set must be unique.

Instead, you can pass a set to Collections2.transform(). If you need a set at 
output, 
copy the collection generated by Collections2.transform() into a set.

Original comment by jared.l....@gmail.com on 13 Aug 2009 at 1:57

GoogleCodeExporter commented 9 years ago
To complement Jared response, he describes a situation where there wouldn't 
exist an 
inverse function.

But we should note that if there was a "Bijection" interface, which defined 
both 
directions of a function, such a transform method would be easily defined in 
terms of 
that. 

Original comment by jim.andreou on 13 Aug 2009 at 2:13

GoogleCodeExporter commented 9 years ago
I have a use case for this, where I'm trying to implement a Map. I want the 
keySet to
be a view over the EntrySet, not just a copy. In this (admittedly limited) 
scenario
we have the advantage of being able to guarantee that the transformed elements 
will
indeed be unique. The function here would simply be to pull out the key from the
entry. Of course, as Jim Andreou points out above, you'd need a little more
cleverness for contains(Object), remove etc, but these could be achieved by a
function that boils down to key->underlyingMap.getEntry(key).

The simplest solution is to extend AbstractSet and implement iterator() using
Iterators.transform, but then most operations degenerate to a linear trawl, and 
I
can't easily delegate back to the entrySet/underlying map to do the work.

What would you recommend to implement this? Just a fuller implementation of
AbstractSet? Would there be any scope for requiring the inverse function to be 
given?
You'd shift the burden for guaranteeing correctness to the implementation of the
functions, but would only need it to be valid over the domain of the contents 
of the set.

Set<F> transform(Set<F> fromSet, Function<? super F,? extends T> function, 
Function<?
extends T,? super F> inverse) // generics would need more thought

I realise that this would be a rather complex addition at this stage, but I'd be
interested to hear others' thoughts. My point is that though this won't work in
general, there is a class of cases where you can provide the required 
guarantees such
that it could be well-defined and indeed useful.

Thanks,
Joe

Original comment by joe.kear...@gtempaccount.com on 13 Aug 2009 at 2:29

GoogleCodeExporter commented 9 years ago
Joe, what features do you need that AbstractMap.keySet() doesn't provide?  A 
fast
remove()?  The ability to easily use the keySet() implementation without 
extending
AbstractMap?

Original comment by cpov...@google.com on 13 Aug 2009 at 3:12

GoogleCodeExporter commented 9 years ago
Yes, sort of. I'm not using AbstractMap, but in any case that implementation of
keySet() just creates an AbstractSet over the keys in the entrySet iterator, so 
most
operations are still linear.

We're not using an AbstractMap for slightly obscure memory reasons, we're 
optimising
for a slightly unfortunate case we have where we see a vast number of small or 
empty
maps. The keySet and entrySet fields from AbstractMap turned out to be a 
noticeable
chunk of the memory used. It's an array-backed map akin to the LinearMap 
suggestion
in another bug report. The reason I'm still looking for better-than-linear
performance in the key-set methods is that we're using a binary search over the
(Comparable) keys. Maybe LogarithmicMap would have been better...

I don't think it's particularly complex to just delegate everything back to the
underlying map and translate the details, it would just be nice to have a 
utility to
do it for me. Then again, maybe this doesn't come up that often.

Original comment by joe.kear...@gtempaccount.com on 13 Aug 2009 at 3:38

GoogleCodeExporter commented 9 years ago
Reopening, because in the future we likely will have an invertible Function (we 
call 
it Converter, for better or worse), and when we do, this method is worth 
considering. 
It would simply have to document that your converter had darn well better be a 
strict 
bijection (so no String<->Double or things like that!).

Original comment by kevin...@gmail.com on 4 Nov 2009 at 11:04

GoogleCodeExporter commented 9 years ago
Upon further discussion, I'm coming to the same conclusion that I had a few 
years ago 
when Sets.transform() was first requested: nearly every conceptually invertible 
function is lossy; truly lossless (*mathematically* invertible) ones are 
vanishingly 
rare.  The result could be a broken Set.  That doesn't mean we'll definitely 
never do 
Sets.transform(), but its priority feels low for this reason.

Original comment by kevinb@google.com on 12 Feb 2010 at 5:06

GoogleCodeExporter commented 9 years ago
note: depends on bug 177.

Original comment by kevinb@google.com on 23 Apr 2010 at 8:27

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 Jul 2010 at 3:53

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 26 Jan 2011 at 10:11

GoogleCodeExporter commented 9 years ago
@Kevinb: Developers may be aware that some sets are truly lossless, like 
converting db entities into their ids. If sets happen to be lossy, simply 
runtime exception could be thrown within Sets.transform() method. Sounds like a 
reasonable solution to me.

Original comment by maliniakh on 2 Feb 2011 at 1:04

GoogleCodeExporter commented 9 years ago
We wouldn't be able to throw such an exception; we wouldn't know there was 
anything wrong.  It would just produce a buggy Set.  Again, we're not rejecting 
the idea and it will probably happen eventually.

Original comment by kevinb@google.com on 2 Feb 2011 at 3:03

GoogleCodeExporter commented 9 years ago
I think that's an example of where lossiness does not matter.  If a DB entity 
has been deleted, yes regenerating it from its id will fail, but then the 
object must have been invalidated already (even if the fact was not known.)

Original comment by fin...@gmail.com on 2 Feb 2011 at 4:42

GoogleCodeExporter commented 9 years ago
Now that Converter has some traction in Google-internal code, we decided to 
revisit this.  I did a survey of callers of 
{Iterables,Lists,Collections2}.transform(..., someConverter.asFunction()), 
which is the closest analogue to the requested Sets.transform, to see whether 
they might benefit from the new method.

The result is that Sets.transform is unnecessary or bug-promoting for all of 
them.  Here's a list of criteria that a caller would need to meet for the 
method to be helpful:

- Their converter must be reversible in the mathematical sense.  *Maybe* we're 
OK if "01" and "1" map to the same thing (though I wouldn't want to be the one 
to play fast and loose with user input in a webapp), but people love Converter 
for converting between protocol buffers and richer POJOs.  Why's that a 
problem?  Protocol buffers have a concept of "repeated" fields, which are 
represented as a List.  However, they're commonly used for data that would go 
in a Set in the POJO.  The result is that a protocol buffer containing [a,b] 
and a protocol buffer containing [b,a] would be considered unequal, but both 
would map to the same POJO.  This is the sort of bug that could manifest itself 
years after the code was written, e.g., during a JDK update to HashSet's 
hashing algorithm, a change that the developer might be only dimly aware of.

- Part of this is that the conversion probably shouldn't have any invalid 
inputs.  If -1 is a valid Integer but not a valid UserId and my UserId 
constructor throws IllegalArgumentException if it's provided, Sets.transform 
can give me a set for which contains(-1) throws IllegalArgumentException.  Ugh.

- The conversion probably has to be fast, since it will likely be performed 
many times.  The Converters I see in Google code generally aren't (e.g., 
protocol buffers again); as a result, many callers of transform() immediately 
call ImmutableSet.copyOf() on the result, or they iterate over it a single time 
and throw it away.  ImmutableSet.copyOf() ensures that they won't be 
transforming their protocol buffers over and over during inner loops that call 
contains().

- The conversion probably has to be performed to many elements.  If I'm 
transforming just one or two, why not create the ImmutableSet copy, or why not 
suffer the O(n) contains() already available to users of 
Collections2.transform()?

- The consumer of the set must not need to perform a defensive copy, as many 
constructors do.  If I'm calling ImmutableSet.copyOf(), all I need as input is 
an Iterable, which I can get from Iterables.transform().

- More generally, the caller needs to be planning to perform calls to 
contains() (or perhaps remove()/add(), but that seems unlikely), as these are 
the only operations that would be faster with Sets.transform() than with 
Collections2.transform().

- Those contains() calls must be performed by someone other than the caller 
himself.  If the caller himself wants to perform them, it's often just as easy 
to call set.contains(userId.toInteger()) n times instead of Sets.transform(set, 
userIdToInteger) once and set.contains(userId) n times.  Of course there's a 
tipping point, but I didn't see it in my investigations.

I remain convinced that Sets.transform would be very cool -- I had a CL to 
implement it myself 3 years ago, one of my first Java-libraries CLs -- but the 
more research I do, the more convinced I become that it would cause more 
problems than it solves.

I'm re-closing the bug.

The Converter class itself will still likely see the light of day.

Original comment by cpov...@google.com on 17 Feb 2011 at 4:30

GoogleCodeExporter commented 9 years ago
The comments below were written without seeing Chris' comments above. I agree 
the contract required for the Sets#transform's bijection is tricky, but not as 
tricky as described. I think the confusion comes from defining a bijection 
between all of A and all of B, while we only require it between a Set<A> and 
Set<B>, as I explain below, which is nowhere near as hard to implement 
correctly as the former one. 

(For me, the tricky part is this: Do we define a complete A<-->B bijection 
type, and reduce its requirements in Sets#transform, or build a constrained 
(Set<A> <--> Set<B>, i.e. something like "isDefinedAt(A)" method) bijection 
type to begin with, with exactly the right spec for Sets#transform? I would 
definitely choose the latter, which is as useful and general - it can mimick 
the former just by implementing "isDefinedAt(A)" to return true, if the 
implementor can truly upheld the contract)

---
Just a comment from my recent experience implementing something like a 
Sets#transform: the tricky part is to specify the bijection contract. I mean 
the exact semantics of the methods, not just the signatures.

For example:
public interface InvertibleFunction<A,B> extends Function<A,B> {
    InvertableFunction<B,A> reverse();
}

This, without any other specification, implies a bijection A <--> B, i.e. from 
*every* A to *every* B (oh, yes, A and B (infinite) sets should have the same 
cardinality too). This specification makes it the easiest to implement 
Sets#transform, but also the hardest to implement InvertibleFunction correctly. 

At least from the perspective of Sets#transform, what is really needed is 
something far less, and far easier to implement: a bijection not from *all* A 
to all B, but one that needs only to be defined for those A's *in a specific 
Set<A>*. The image of that would be a Set<B>, and similarly, the inverse 
function would only be required to map *only* the B's in that Set<B> back to 
A's of Set<A>. 

InvertibleFunction as given above doesn't make it easy to restrict ourselves to 
a smaller domain than the type parameter itself (Set<A> is just a subset of A). 

Aside: scala functions have an "isDefinedAt(x: A)" method, which is precisely 
what I'm talking about. They don't define functions over all A, but only a 
subset of it (a predicate on elements of A, such as isDefinedAt, defines a set 
of A just as Set<A> does, modulo not providing an iterator). 

But we have no isDefinedAt method, and we can't define one, and even if we 
could, interfaces with two methods are going to become much more costly than 
those with a single method when closures arrive, so I think we are pretty much 
stuck with the clumsier approach. The good news is that the clumsier approach 
is workable, but the specification of a potential Sets#transform would be 
forced to explain this subtlety, that not a complete bijection is required, but 
only one defined for the given Set<A> and its image Set<B>, for other inputs it 
would be free to return whatever. 

Original comment by andr...@google.com on 18 Mar 2011 at 10:21

GoogleCodeExporter commented 9 years ago
"*Maybe* we're OK if "01" and "1" map to the same thing" -- indeed, this is 
where nastiness begins. That the user can create a broken implementation in 
which all of these hold:

1) Set<A> contains "01" and "1" (A = String)
2) Set<B> contains Integer(1) (B = Integer)
3) setA.size() == setB.size()
4) ...setB.iterator() appears to contain duplicates when iterated :-/

Which is subtle, but as the fundamental CS tenet goes, 'garbage in, garbage 
out'...

Original comment by andr...@google.com on 18 Mar 2011 at 10:38

GoogleCodeExporter commented 9 years ago
The discussion on this issue is long and complex, with lots of good arguments 
for and against using Sets.transform in specific scenarios. However, the 
arguments around the necessity of a bijection seem to be a primary reason for 
not including it the library at all. This confuses me a bit because it seems 
like all that is really needed is an *injection* (aka a one-to-one (but not 
onto) function).

Analogous to how Set refines Collection purely in terms of semantics (i.e. 
Javadoc only), couldn't a new Injection interface refine Function semantically 
as f(a) = f(b) implies a = b? This provides a well-defined Set with a 
linear-time contains() (as provided by AbstractSet), which is good enough in 
many common cases. I've done this myself, so I could easily provide code if 
desired.

Adding a Bijection (which refines Injection) with an inverse function would 
allow the transformed set to delegate contains() to the base set, preserving 
its likely better than linear runtime. This could be provided as a 
Sets.transform overload. I think the main trick here is that type erasure 
doesn't allow contains(Object) to safely call inverse(F) (since you can't use 
instanceof with a type parameter). Of course, that test could be included in 
the Bijection interface, e.g. rangeContains(Object) or rangeCast(Object).

Is this line of thought reasonable, or am I missing something?

Original comment by tre...@scurrilous.com on 27 Feb 2012 at 8:46

GoogleCodeExporter commented 9 years ago
Note that Sets#transform is already implemented with a bijection, feel free to 
peek in the code. (Used by something called "WellBehavedMap", if I recall 
correctly)

But yes, this could have been implemented this with just an injection, and a 
linear time Set<B>#contains. But that would be surprisingly slow, right? You do 
expect Set#contains to be relatively fast.

Original comment by andr...@google.com on 27 Feb 2012 at 11:01

GoogleCodeExporter commented 9 years ago
I'm really leery of introducing a whole new Injection interface just for the 
sake of this one method.  I'm convinced by Chris' argument earlier that 
Collections2.transform gives you something with the same performance 
guarantees, and if you really want a Set -- you probably really want the Set 
because you want a fast contains, so just copy it into an ImmutableSet.

Original comment by wasserman.louis on 27 Feb 2012 at 11:11

GoogleCodeExporter commented 9 years ago
Unless, of course, you have to keep the transformed set in sync with the 
original, and you plan to do enough contains() checks to worth the cost to 
construct an ImmutableSet. (Just framing this a bit better)

Original comment by andr...@google.com on 27 Feb 2012 at 11:30

GoogleCodeExporter commented 9 years ago
@andreou: Heh, thanks, I never thought to look for an internal implementation. 
Not to you personally, but reflectively for all: it's a little disingenuous to 
argue against the need for/validity of something while using it privately, no? 
;-)

@louis: No, I promise, I really don't want fast contains(). :-) I think there 
are many common usages that don't ever need to call contains() and primarily 
use size() and iterator(). However, Collection isn't always an option because 
of the need to interoperate with APIs that use Set because of its semantics 
(i.e. no duplicates), without expectations of the performance of contains(). 
Consider java.util.concurrent.CopyOnWriteArraySet as an example. Copying into 
ImmutableSet has worse performance in such cases, because each construction 
redundantly verifies the uniqueness constraint.

I really find the pushback surprising here, given that:
a) it's definitely useful/preferred in some cases (e.g. WellBehavedMap),
b) it's already implemented (and almost trivially, at that),
c) given an interface more specialized than Function (be it InvertibleFunction 
or Injection or other), it's pretty hard for someone to shoot himself in the 
foot.

Original comment by tre...@scurrilous.com on 28 Feb 2012 at 12:22

GoogleCodeExporter commented 9 years ago
Trevor, it would be funny if you directed that to me personally; I was arguing 
in favor of this. :) (#17 sums up the story of writing this)

Original comment by andr...@google.com on 28 Feb 2012 at 12:31

GoogleCodeExporter commented 9 years ago
Another way to convert your Collections2.transform Collection to a Set:

class ForwardingCollectionSet extends ForwardingCollection implements Set {
  protected Collection delegate() { return delegate; }
}

Original comment by cpov...@google.com on 28 Feb 2012 at 3:48

GoogleCodeExporter commented 9 years ago
...except you need to override hashCode and equals, but other than that, Chris 
is correct.  (That is the easiest way to view a Collection known to be unique 
as a Set, and it _is_ quite easy.)

My main objection is the increased API surface -- creating an interface just 
for the sake of one method seems a bit much.

Original comment by wasserman.louis on 28 Feb 2012 at 4:23

GoogleCodeExporter commented 9 years ago
That's not a terrible solution, but overriding hashCode and equals (correctly) 
seems like a subtlety that cries out to be done carefully in a library such as 
this, rather than being done ad hoc.

Louis, I understand that you need to worry about API surface in general, but 
that argument seems like an oversimplification in this case:

  public interface Injection<F, T> extends Function<F, T> {}

Other than Javadoc, that's it. It hardly seems like a maintenance problem, and 
I'm happy to submit a patch including the Javadoc.

Speaking of subtlety, I'd like to point out a slightly different argument in 
favor of this feature: Obviously people are reaching for Sets.transform. 
Finding it missing, they likely  end up here. If ImmutableSet.copyOf solves 
their situation, they're on their way in 20 mintues. Otherwise, they spend a 
day trying to understand this thread (and possibly want to rehash it) and then 
kludge together a solution that likely misses a few of the subtleties. 
Distilling an overview of these issues into a single static method and empty 
interface (and accompanying Javadoc) seems worth even more than the actual 
code, even if it encourages them to use ImmutableSet in the end anyway.

Original comment by tre...@scurrilous.com on 28 Feb 2012 at 9:21

GoogleCodeExporter commented 9 years ago
Let me put it another way: there's already a decent amount of momentum behind 
Converter, which you describe as Bijection.  I'd be absolutely okay with 
waiting for Converter to get off the ground and then providing Sets.transform 
after that.

I'd like to hear opinions from other Guava team members, though.

Original comment by wasserman.louis on 28 Feb 2012 at 9:25

GoogleCodeExporter commented 9 years ago
Indeed, my mistake makes the ForwardingCollection solution look bad, and it's 
not something I'd so much "recommend" as "fall back on if necessary."

The main problem with Sets.transform remains that it doesn't work right in the 
main cases that people seem to want it to work -- parsing numbers, for 
instance.  Dimitris and I disagree over whether it's even a good fit for 
WellBehavedMap.  (That use, too, had a subtle bug, fortunately caught by 
another reviewer.)  Plus, ImmutableSet appeared to be an adequate solution for 
all the potential users we looked at.  There will be users for whom this is not 
true, but overall I suspect that the method would cause enough bugs to 
outweight any improvements it can produce.

People are welcome to continue to discuss, but we have had this discussion 
repeatedly over the years, and I'm trying my best not to sink a few hours into 
it for the nth time.

I'll update the bug title to make the workaround clear.

Original comment by cpov...@google.com on 28 Feb 2012 at 9:36

GoogleCodeExporter commented 9 years ago

Original comment by cpov...@google.com on 28 Feb 2012 at 9:45

GoogleCodeExporter commented 9 years ago
"Dimitris and I disagree over whether it's even a good fit for WellBehavedMap"

This is the first time I learn about this.

There are two reasons why WellBehavedMap can't use the suggested 
"ImmutableSet.copyOf(Collections2.transform(...))"
1) The raison d'etre of EnumMultiset, EnumBiMap, EnumHashBiMap is higher 
iteration performance; otherwise a user could get the same functionality out of 
HashMultiset and HashBiMap. (The source of the bug, after all, was EnumMap 
overdoing it with performance considerations over its entrySet()). Constructing 
an ImmutableSet(Collections2.transform(...)) of the whole structure in 
entrySet() of these structures would just defeat their point.

2) entrySet() is supposed to be a live view anyway. So we wouldn't even hope to 
get away with doing a _single_ copy of the entire structure - after every 
update, we would have to drop any constructed ImmutableSet, to recreate it the 
next time. Let alone the extra memory retention that this shadow copy of the 
whole structure implies.

Sets#transform is really the best fit for WellBehavedMap - if you disagree, 
please provide data to back up that opinion.

Original comment by andr...@google.com on 28 Feb 2012 at 10:30

GoogleCodeExporter commented 9 years ago
Right -- I didn't mean that we should use ImmutableSet.copyOf(...) there, 
merely that I didn't think a general-purpose Sets.transform was necessary -- 
and, it seems, it actually distracted from the need for a live view.  (Or maybe 
I'm just making excuses for not spotting the problem :))

Original comment by cpov...@google.com on 28 Feb 2012 at 10:36

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:16

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:10