Closed GoogleCodeExporter closed 9 years ago
How would you implement Set<T>.contains(Object) in the resulting set? (Or if
that set
was not lazy, you wouldn't need this method). Well, you would need the inverse
function
T --> F as well (which would imply that such an inverse function *exists*). So
it's
lost more complicated than that, and I'm not sure that's the only issue.
Original comment by jim.andreou
on 13 Aug 2009 at 11:59
I didn't notice it's implemented lazy for list, sorry, only now I checked the
sources.
OK, that's probably too much complicated then.
Original comment by butko...@gmail.com
on 13 Aug 2009 at 12:11
Sets.transform isn't feasible. The problem is that the function could map two
distinct
elements in fromSet to the same value, which would violate the requirement that
all
elements in the returned set must be unique.
Instead, you can pass a set to Collections2.transform(). If you need a set at
output,
copy the collection generated by Collections2.transform() into a set.
Original comment by jared.l....@gmail.com
on 13 Aug 2009 at 1:57
To complement Jared response, he describes a situation where there wouldn't
exist an
inverse function.
But we should note that if there was a "Bijection" interface, which defined
both
directions of a function, such a transform method would be easily defined in
terms of
that.
Original comment by jim.andreou
on 13 Aug 2009 at 2:13
I have a use case for this, where I'm trying to implement a Map. I want the
keySet to
be a view over the EntrySet, not just a copy. In this (admittedly limited)
scenario
we have the advantage of being able to guarantee that the transformed elements
will
indeed be unique. The function here would simply be to pull out the key from the
entry. Of course, as Jim Andreou points out above, you'd need a little more
cleverness for contains(Object), remove etc, but these could be achieved by a
function that boils down to key->underlyingMap.getEntry(key).
The simplest solution is to extend AbstractSet and implement iterator() using
Iterators.transform, but then most operations degenerate to a linear trawl, and
I
can't easily delegate back to the entrySet/underlying map to do the work.
What would you recommend to implement this? Just a fuller implementation of
AbstractSet? Would there be any scope for requiring the inverse function to be
given?
You'd shift the burden for guaranteeing correctness to the implementation of the
functions, but would only need it to be valid over the domain of the contents
of the set.
Set<F> transform(Set<F> fromSet, Function<? super F,? extends T> function,
Function<?
extends T,? super F> inverse) // generics would need more thought
I realise that this would be a rather complex addition at this stage, but I'd be
interested to hear others' thoughts. My point is that though this won't work in
general, there is a class of cases where you can provide the required
guarantees such
that it could be well-defined and indeed useful.
Thanks,
Joe
Original comment by joe.kear...@gtempaccount.com
on 13 Aug 2009 at 2:29
Joe, what features do you need that AbstractMap.keySet() doesn't provide? A
fast
remove()? The ability to easily use the keySet() implementation without
extending
AbstractMap?
Original comment by cpov...@google.com
on 13 Aug 2009 at 3:12
Yes, sort of. I'm not using AbstractMap, but in any case that implementation of
keySet() just creates an AbstractSet over the keys in the entrySet iterator, so
most
operations are still linear.
We're not using an AbstractMap for slightly obscure memory reasons, we're
optimising
for a slightly unfortunate case we have where we see a vast number of small or
empty
maps. The keySet and entrySet fields from AbstractMap turned out to be a
noticeable
chunk of the memory used. It's an array-backed map akin to the LinearMap
suggestion
in another bug report. The reason I'm still looking for better-than-linear
performance in the key-set methods is that we're using a binary search over the
(Comparable) keys. Maybe LogarithmicMap would have been better...
I don't think it's particularly complex to just delegate everything back to the
underlying map and translate the details, it would just be nice to have a
utility to
do it for me. Then again, maybe this doesn't come up that often.
Original comment by joe.kear...@gtempaccount.com
on 13 Aug 2009 at 3:38
Reopening, because in the future we likely will have an invertible Function (we
call
it Converter, for better or worse), and when we do, this method is worth
considering.
It would simply have to document that your converter had darn well better be a
strict
bijection (so no String<->Double or things like that!).
Original comment by kevin...@gmail.com
on 4 Nov 2009 at 11:04
Upon further discussion, I'm coming to the same conclusion that I had a few
years ago
when Sets.transform() was first requested: nearly every conceptually invertible
function is lossy; truly lossless (*mathematically* invertible) ones are
vanishingly
rare. The result could be a broken Set. That doesn't mean we'll definitely
never do
Sets.transform(), but its priority feels low for this reason.
Original comment by kevinb@google.com
on 12 Feb 2010 at 5:06
note: depends on bug 177.
Original comment by kevinb@google.com
on 23 Apr 2010 at 8:27
Original comment by kevinb@google.com
on 30 Jul 2010 at 3:53
Original comment by fry@google.com
on 26 Jan 2011 at 10:11
@Kevinb: Developers may be aware that some sets are truly lossless, like
converting db entities into their ids. If sets happen to be lossy, simply
runtime exception could be thrown within Sets.transform() method. Sounds like a
reasonable solution to me.
Original comment by maliniakh
on 2 Feb 2011 at 1:04
We wouldn't be able to throw such an exception; we wouldn't know there was
anything wrong. It would just produce a buggy Set. Again, we're not rejecting
the idea and it will probably happen eventually.
Original comment by kevinb@google.com
on 2 Feb 2011 at 3:03
I think that's an example of where lossiness does not matter. If a DB entity
has been deleted, yes regenerating it from its id will fail, but then the
object must have been invalidated already (even if the fact was not known.)
Original comment by fin...@gmail.com
on 2 Feb 2011 at 4:42
Now that Converter has some traction in Google-internal code, we decided to
revisit this. I did a survey of callers of
{Iterables,Lists,Collections2}.transform(..., someConverter.asFunction()),
which is the closest analogue to the requested Sets.transform, to see whether
they might benefit from the new method.
The result is that Sets.transform is unnecessary or bug-promoting for all of
them. Here's a list of criteria that a caller would need to meet for the
method to be helpful:
- Their converter must be reversible in the mathematical sense. *Maybe* we're
OK if "01" and "1" map to the same thing (though I wouldn't want to be the one
to play fast and loose with user input in a webapp), but people love Converter
for converting between protocol buffers and richer POJOs. Why's that a
problem? Protocol buffers have a concept of "repeated" fields, which are
represented as a List. However, they're commonly used for data that would go
in a Set in the POJO. The result is that a protocol buffer containing [a,b]
and a protocol buffer containing [b,a] would be considered unequal, but both
would map to the same POJO. This is the sort of bug that could manifest itself
years after the code was written, e.g., during a JDK update to HashSet's
hashing algorithm, a change that the developer might be only dimly aware of.
- Part of this is that the conversion probably shouldn't have any invalid
inputs. If -1 is a valid Integer but not a valid UserId and my UserId
constructor throws IllegalArgumentException if it's provided, Sets.transform
can give me a set for which contains(-1) throws IllegalArgumentException. Ugh.
- The conversion probably has to be fast, since it will likely be performed
many times. The Converters I see in Google code generally aren't (e.g.,
protocol buffers again); as a result, many callers of transform() immediately
call ImmutableSet.copyOf() on the result, or they iterate over it a single time
and throw it away. ImmutableSet.copyOf() ensures that they won't be
transforming their protocol buffers over and over during inner loops that call
contains().
- The conversion probably has to be performed to many elements. If I'm
transforming just one or two, why not create the ImmutableSet copy, or why not
suffer the O(n) contains() already available to users of
Collections2.transform()?
- The consumer of the set must not need to perform a defensive copy, as many
constructors do. If I'm calling ImmutableSet.copyOf(), all I need as input is
an Iterable, which I can get from Iterables.transform().
- More generally, the caller needs to be planning to perform calls to
contains() (or perhaps remove()/add(), but that seems unlikely), as these are
the only operations that would be faster with Sets.transform() than with
Collections2.transform().
- Those contains() calls must be performed by someone other than the caller
himself. If the caller himself wants to perform them, it's often just as easy
to call set.contains(userId.toInteger()) n times instead of Sets.transform(set,
userIdToInteger) once and set.contains(userId) n times. Of course there's a
tipping point, but I didn't see it in my investigations.
I remain convinced that Sets.transform would be very cool -- I had a CL to
implement it myself 3 years ago, one of my first Java-libraries CLs -- but the
more research I do, the more convinced I become that it would cause more
problems than it solves.
I'm re-closing the bug.
The Converter class itself will still likely see the light of day.
Original comment by cpov...@google.com
on 17 Feb 2011 at 4:30
The comments below were written without seeing Chris' comments above. I agree
the contract required for the Sets#transform's bijection is tricky, but not as
tricky as described. I think the confusion comes from defining a bijection
between all of A and all of B, while we only require it between a Set<A> and
Set<B>, as I explain below, which is nowhere near as hard to implement
correctly as the former one.
(For me, the tricky part is this: Do we define a complete A<-->B bijection
type, and reduce its requirements in Sets#transform, or build a constrained
(Set<A> <--> Set<B>, i.e. something like "isDefinedAt(A)" method) bijection
type to begin with, with exactly the right spec for Sets#transform? I would
definitely choose the latter, which is as useful and general - it can mimick
the former just by implementing "isDefinedAt(A)" to return true, if the
implementor can truly upheld the contract)
---
Just a comment from my recent experience implementing something like a
Sets#transform: the tricky part is to specify the bijection contract. I mean
the exact semantics of the methods, not just the signatures.
For example:
public interface InvertibleFunction<A,B> extends Function<A,B> {
InvertableFunction<B,A> reverse();
}
This, without any other specification, implies a bijection A <--> B, i.e. from
*every* A to *every* B (oh, yes, A and B (infinite) sets should have the same
cardinality too). This specification makes it the easiest to implement
Sets#transform, but also the hardest to implement InvertibleFunction correctly.
At least from the perspective of Sets#transform, what is really needed is
something far less, and far easier to implement: a bijection not from *all* A
to all B, but one that needs only to be defined for those A's *in a specific
Set<A>*. The image of that would be a Set<B>, and similarly, the inverse
function would only be required to map *only* the B's in that Set<B> back to
A's of Set<A>.
InvertibleFunction as given above doesn't make it easy to restrict ourselves to
a smaller domain than the type parameter itself (Set<A> is just a subset of A).
Aside: scala functions have an "isDefinedAt(x: A)" method, which is precisely
what I'm talking about. They don't define functions over all A, but only a
subset of it (a predicate on elements of A, such as isDefinedAt, defines a set
of A just as Set<A> does, modulo not providing an iterator).
But we have no isDefinedAt method, and we can't define one, and even if we
could, interfaces with two methods are going to become much more costly than
those with a single method when closures arrive, so I think we are pretty much
stuck with the clumsier approach. The good news is that the clumsier approach
is workable, but the specification of a potential Sets#transform would be
forced to explain this subtlety, that not a complete bijection is required, but
only one defined for the given Set<A> and its image Set<B>, for other inputs it
would be free to return whatever.
Original comment by andr...@google.com
on 18 Mar 2011 at 10:21
"*Maybe* we're OK if "01" and "1" map to the same thing" -- indeed, this is
where nastiness begins. That the user can create a broken implementation in
which all of these hold:
1) Set<A> contains "01" and "1" (A = String)
2) Set<B> contains Integer(1) (B = Integer)
3) setA.size() == setB.size()
4) ...setB.iterator() appears to contain duplicates when iterated :-/
Which is subtle, but as the fundamental CS tenet goes, 'garbage in, garbage
out'...
Original comment by andr...@google.com
on 18 Mar 2011 at 10:38
The discussion on this issue is long and complex, with lots of good arguments
for and against using Sets.transform in specific scenarios. However, the
arguments around the necessity of a bijection seem to be a primary reason for
not including it the library at all. This confuses me a bit because it seems
like all that is really needed is an *injection* (aka a one-to-one (but not
onto) function).
Analogous to how Set refines Collection purely in terms of semantics (i.e.
Javadoc only), couldn't a new Injection interface refine Function semantically
as f(a) = f(b) implies a = b? This provides a well-defined Set with a
linear-time contains() (as provided by AbstractSet), which is good enough in
many common cases. I've done this myself, so I could easily provide code if
desired.
Adding a Bijection (which refines Injection) with an inverse function would
allow the transformed set to delegate contains() to the base set, preserving
its likely better than linear runtime. This could be provided as a
Sets.transform overload. I think the main trick here is that type erasure
doesn't allow contains(Object) to safely call inverse(F) (since you can't use
instanceof with a type parameter). Of course, that test could be included in
the Bijection interface, e.g. rangeContains(Object) or rangeCast(Object).
Is this line of thought reasonable, or am I missing something?
Original comment by tre...@scurrilous.com
on 27 Feb 2012 at 8:46
Note that Sets#transform is already implemented with a bijection, feel free to
peek in the code. (Used by something called "WellBehavedMap", if I recall
correctly)
But yes, this could have been implemented this with just an injection, and a
linear time Set<B>#contains. But that would be surprisingly slow, right? You do
expect Set#contains to be relatively fast.
Original comment by andr...@google.com
on 27 Feb 2012 at 11:01
I'm really leery of introducing a whole new Injection interface just for the
sake of this one method. I'm convinced by Chris' argument earlier that
Collections2.transform gives you something with the same performance
guarantees, and if you really want a Set -- you probably really want the Set
because you want a fast contains, so just copy it into an ImmutableSet.
Original comment by wasserman.louis
on 27 Feb 2012 at 11:11
Unless, of course, you have to keep the transformed set in sync with the
original, and you plan to do enough contains() checks to worth the cost to
construct an ImmutableSet. (Just framing this a bit better)
Original comment by andr...@google.com
on 27 Feb 2012 at 11:30
@andreou: Heh, thanks, I never thought to look for an internal implementation.
Not to you personally, but reflectively for all: it's a little disingenuous to
argue against the need for/validity of something while using it privately, no?
;-)
@louis: No, I promise, I really don't want fast contains(). :-) I think there
are many common usages that don't ever need to call contains() and primarily
use size() and iterator(). However, Collection isn't always an option because
of the need to interoperate with APIs that use Set because of its semantics
(i.e. no duplicates), without expectations of the performance of contains().
Consider java.util.concurrent.CopyOnWriteArraySet as an example. Copying into
ImmutableSet has worse performance in such cases, because each construction
redundantly verifies the uniqueness constraint.
I really find the pushback surprising here, given that:
a) it's definitely useful/preferred in some cases (e.g. WellBehavedMap),
b) it's already implemented (and almost trivially, at that),
c) given an interface more specialized than Function (be it InvertibleFunction
or Injection or other), it's pretty hard for someone to shoot himself in the
foot.
Original comment by tre...@scurrilous.com
on 28 Feb 2012 at 12:22
Trevor, it would be funny if you directed that to me personally; I was arguing
in favor of this. :) (#17 sums up the story of writing this)
Original comment by andr...@google.com
on 28 Feb 2012 at 12:31
Another way to convert your Collections2.transform Collection to a Set:
class ForwardingCollectionSet extends ForwardingCollection implements Set {
protected Collection delegate() { return delegate; }
}
Original comment by cpov...@google.com
on 28 Feb 2012 at 3:48
...except you need to override hashCode and equals, but other than that, Chris
is correct. (That is the easiest way to view a Collection known to be unique
as a Set, and it _is_ quite easy.)
My main objection is the increased API surface -- creating an interface just
for the sake of one method seems a bit much.
Original comment by wasserman.louis
on 28 Feb 2012 at 4:23
That's not a terrible solution, but overriding hashCode and equals (correctly)
seems like a subtlety that cries out to be done carefully in a library such as
this, rather than being done ad hoc.
Louis, I understand that you need to worry about API surface in general, but
that argument seems like an oversimplification in this case:
public interface Injection<F, T> extends Function<F, T> {}
Other than Javadoc, that's it. It hardly seems like a maintenance problem, and
I'm happy to submit a patch including the Javadoc.
Speaking of subtlety, I'd like to point out a slightly different argument in
favor of this feature: Obviously people are reaching for Sets.transform.
Finding it missing, they likely end up here. If ImmutableSet.copyOf solves
their situation, they're on their way in 20 mintues. Otherwise, they spend a
day trying to understand this thread (and possibly want to rehash it) and then
kludge together a solution that likely misses a few of the subtleties.
Distilling an overview of these issues into a single static method and empty
interface (and accompanying Javadoc) seems worth even more than the actual
code, even if it encourages them to use ImmutableSet in the end anyway.
Original comment by tre...@scurrilous.com
on 28 Feb 2012 at 9:21
Let me put it another way: there's already a decent amount of momentum behind
Converter, which you describe as Bijection. I'd be absolutely okay with
waiting for Converter to get off the ground and then providing Sets.transform
after that.
I'd like to hear opinions from other Guava team members, though.
Original comment by wasserman.louis
on 28 Feb 2012 at 9:25
Indeed, my mistake makes the ForwardingCollection solution look bad, and it's
not something I'd so much "recommend" as "fall back on if necessary."
The main problem with Sets.transform remains that it doesn't work right in the
main cases that people seem to want it to work -- parsing numbers, for
instance. Dimitris and I disagree over whether it's even a good fit for
WellBehavedMap. (That use, too, had a subtle bug, fortunately caught by
another reviewer.) Plus, ImmutableSet appeared to be an adequate solution for
all the potential users we looked at. There will be users for whom this is not
true, but overall I suspect that the method would cause enough bugs to
outweight any improvements it can produce.
People are welcome to continue to discuss, but we have had this discussion
repeatedly over the years, and I'm trying my best not to sink a few hours into
it for the nth time.
I'll update the bug title to make the workaround clear.
Original comment by cpov...@google.com
on 28 Feb 2012 at 9:36
Original comment by cpov...@google.com
on 28 Feb 2012 at 9:45
"Dimitris and I disagree over whether it's even a good fit for WellBehavedMap"
This is the first time I learn about this.
There are two reasons why WellBehavedMap can't use the suggested
"ImmutableSet.copyOf(Collections2.transform(...))"
1) The raison d'etre of EnumMultiset, EnumBiMap, EnumHashBiMap is higher
iteration performance; otherwise a user could get the same functionality out of
HashMultiset and HashBiMap. (The source of the bug, after all, was EnumMap
overdoing it with performance considerations over its entrySet()). Constructing
an ImmutableSet(Collections2.transform(...)) of the whole structure in
entrySet() of these structures would just defeat their point.
2) entrySet() is supposed to be a live view anyway. So we wouldn't even hope to
get away with doing a _single_ copy of the entire structure - after every
update, we would have to drop any constructed ImmutableSet, to recreate it the
next time. Let alone the extra memory retention that this shadow copy of the
whole structure implies.
Sets#transform is really the best fit for WellBehavedMap - if you disagree,
please provide data to back up that opinion.
Original comment by andr...@google.com
on 28 Feb 2012 at 10:30
Right -- I didn't mean that we should use ImmutableSet.copyOf(...) there,
merely that I didn't think a general-purpose Sets.transform was necessary --
and, it seems, it actually distracted from the need for a live view. (Or maybe
I'm just making excuses for not spotting the problem :))
Original comment by cpov...@google.com
on 28 Feb 2012 at 10:36
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:16
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:10
Original issue reported on code.google.com by
butko...@gmail.com
on 13 Aug 2009 at 11:52