Closed GoogleCodeExporter closed 9 years ago
I'm of the opinion that the current behavior of methods like containsAll and
removeAll is exactly what one would expect (particularly given that
AbstractCollection does the exact same thing) and that wrapping the input in a
HashSet before passing it to the method if you want to guarantee linear
complexity is simple, clear and gives the user the choice in the tradeoff
between time complexity and using extra space. I'm tempted to mark this as
WorkingAsIntended, but I'll let others chime in if they have any opinions.
Original comment by cgdec...@gmail.com
on 12 Sep 2012 at 2:49
I concur with Colin. This is WorkingAsIntended.
Original comment by lowas...@google.com
on 12 Sep 2012 at 2:57
> wrapping the input in a HashSet before passing it to the method if
> you want to guarantee linear complexity is simple
You are right, it's simple to fix. However, developers don't think at
the complexity of each method *call* each time they *type* one.
Coding like this would impossible. In fact, libraries should simplify
coding and hide complexity from developers.
If we can make a slow method fast, why not do it? Especially when the
fix is one line of code in the library, and it hides complexity from
the developer.
Original comment by adi1756...@gmail.com
on 12 Sep 2012 at 3:29
I don't think any of us can guarantee that there aren't users, in the wild,
passing TreeSets to this method -- with comparators that aren't consistent with
equals.
Are those users being a little bit naughty? Yes. Do those users deserve to be
broken? I don't think so.
Even worse, those users couldn't figure out their problem without going into
the Guava source, which makes me deeply uncomfortable. Users should never
*have* to go into the Guava source to figure out their problem.
Original comment by lowas...@google.com
on 12 Sep 2012 at 4:11
Related to this topic, it would be good to add a LazyHashSet class in
Guava. This would help developers optimize their code from the call
site, like you suggested. A LazyHashSet class would help both in
Issue 1144 and in Issue 1085.
Currently there is no way to populate a hash lazily from the call site
(like in patchFull.diff). However, populating the hash lazily has
higher performance than populating it eagerly (see Comment 11 in Issue
1085).
For example, with a LazyHashSet class, the call in Issue 1144 would
look like:
"Iterators.removeAll(removeFrom, new LazyHashSet(elementsToRemove))".
Original comment by adi1756...@gmail.com
on 12 Sep 2012 at 4:23
From Comment 4:
> passing TreeSets to this method
TreeSet is not a List. The patch explicitly checks if
"elementsToRemove" is a List, and only then the patch triggers the
fast execution path (that uses the HashSet). Therefore, passing
TreeSets works fine because it executes the current code.
Original comment by adi1756...@gmail.com
on 12 Sep 2012 at 7:48
(Also, thanks to protobuf we also have to consider the (rare) case of elements
that don't implement hashCode at all.)
Original comment by kevinb@google.com
on 12 Sep 2012 at 10:15
If it's difficult to fix in the callee, maybe we should explicitly
document this quadratic behavior in the method documentation. Here is
an example inspired from ListUtils in Apache Collections:
......................................................................
* <p>
* This implementation iterates over <code>removeFrom</code>, checking
* each element in turn to see if it's contained in
* <code>elementsToRemove</code>. If it's contained, it's removed
* from <code>removeFrom</code>. As a consequence, it is advised to
* use a collection type for <code>elementsToRemove</code> that
* provides a fast (e.g. O(1)) implementation of
* {@link Collection#contains(Object)}.
......................................................................
Also, can you add the LazyHashSet class (Comment 5), such that the
caller can build the hash lazily? Right now, the user is forced to
build the HashSet eagerly, which is slower (up to 60X slower, Issue
1085 Comment 11) than building it lazily.
Original comment by adi1756...@gmail.com
on 12 Sep 2012 at 10:52
Certainly not without doing more thorough (and reliable) benchmarking than the
technique used in that comment, I suspect. Can you put together a benchmark
with Caliper? (https://code.google.com/p/caliper/)
Original comment by lowas...@google.com
on 12 Sep 2012 at 11:41
Yes, sure, I will upload one in a few days, maybe a week.
Original comment by adi1756...@gmail.com
on 13 Sep 2012 at 10:42
I uploaded a Caliper benchmark (Issue 1085 Comment 13) similar to the
non-Caliper benchmark in Issue 1085 Comment 11. The results are the
same as for the non-Caliper version: the code using a lazy hash set is
faster then both the code using an eager hash set and the code using
the current quadratic implementation (full details in Issue 1085
Comment 11). This quantitative data shows it makes sense to allow
developers to speed up their code using a LazyHashSet.
Original comment by adi1756...@gmail.com
on 25 Sep 2012 at 10:32
I attached a sketch for the LazyHashSet class that implements lazy
"contains" and "containsAll". This is an example usage:
"LazyHashSet.create(list).containsAll(other);"
instead of:
"list.containsAll(other)"
Original comment by adi1756...@gmail.com
on 2 Oct 2012 at 2:23
Attachments:
We intend for these methods to do the predictable thing. If anything more
complex is required, we suggest using your own implementations.
Original comment by kak@google.com
on 22 Aug 2013 at 10:14
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:13
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:08
Original issue reported on code.google.com by
adi1756...@gmail.com
on 12 Sep 2012 at 1:56Attachments: