Collection.removeAll and retainAll can throw NullPointerException even if annotated conservatively

cpovirk commented 4 years ago

From their docs (emphasis mine):

Throws: ... NullPointerException - if this collection contains one or more null elements and the specified collection does not support null elements (optional), or if the specified collection is null

That means that it's often safe to call removeAll and retainAll only if:

the receiver collection has a non-null element type (currently not expressible; see #3203 ), or
if the argument collection supports calls to contains(null) (which seems harder to express, short of permitting calls to removeAll and retainAll only if the argument itself has a null element type)

Putting those together, maybe what we actually want is something like the following...?

<E extends @PolyNull Object> boolean removeAll(Collection<@PolyNull ? extends Object> c);

If I'm understanding right, that may still be more conservative than we'd like (since it ought to be safe to pass, e.g., an ArrayList<@NonNull String>, which supports null queries even though it doesn't contain nulls), but it may be an improvement.

Here's a way to get NPE with the current signatures:

import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import org.checkerframework.checker.nullness.qual.Nullable;

public class QueryNull {
  public static void main(String[] args) {
    List<@Nullable String> c = new ArrayList<@Nullable String>();
    c.add(null);
    Set<String> toRemove = new TreeSet<String>();
    c.removeAll(toRemove);
  }
}

cpovirk commented 4 years ago

It occurs to me that the docs for Collection.removeAll technically don't even allow the collection to throw if it doesn't support null: The only case mentioned is if the other collection doesn't support null (or the other collection is null).

Even AbstractSet.removeAll does not document such an exception, even though its implementation (which is inherited by TreeSet) can be used to produce a NullPointerException:

import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;
import org.checkerframework.checker.nullness.qual.Nullable;

public class QueryNull {
  public static void main(String[] args) {
    TreeSet<String> c = new TreeSet<String>();
    c.add("foo");
    c.add("bar");
    Set<@Nullable String> toRemove = new HashSet<@Nullable String>();
    toRemove.add(null);
    c.removeAll(toRemove);
  }
}

At least this additional problem is more easily avoidable by making containsAll require a Collection<? extends @NonNull Object>.

This means that my @PolyNull proposal above is wrong. The only guaranteed safe calls to Collection.removeAll may be when either both collections have nullable element types or both have non-nullable element types. So maybe what we'd want is:

<E extends @PolyNull Object> boolean removeAll(Collection<@PolyNull ?> c);

cpovirk commented 4 years ago

(though note that there have been discussions about changing the behavior of AbstractSet.removeAll that would eliminate this case -- at least for that particular method, which might be the only place it comes up in the JDK itself?)

cpovirk commented 4 years ago

Sorry, I didn't write this clearly. I'll try to add some more info.

This bug report makes the most sense in light of the implementation requirements of AbstractCollection.removeAll:

This implementation iterates over this collection, checking each element returned by the iterator in turn to see if it's contained in the specified collection.

This explains why Collection.removeAll has the @throws clause that it does. The important fact is that receiver.removeAll(c) may invoke c.contains(element) for each element of receiver. (This is backward from how containsAll works.) Because contains doesn't necessarily allow null arguments, this may throw.

The safest way to ensure that c.contains allows null arguments (necessary if receiver has a nullable element type) is to require c to be a Collection<@Nullable ?>. But...

Then that brings us to the problem specific to AbstractSet.removeAll: That implementation may perform the "reverse" operation. That is, it may invoke receiver.contains(element) for each element of c. So, if we require c to be a Collection<@Nullable ?>, then the implementation may call receiver.contains(null). The best way to require that that is safe is to require receiver to have a nullable element type.

But again, that is safe only if c has a nullable element type, too. We need the nullness of the two element types to match. That brings us to (once it's supported):

<E extends @PolyNull Object> boolean removeAll(Collection<@PolyNull ?> c);

cpovirk commented 4 years ago

(Sorry, I copied the wrong signature into the end of my most recent post. I've now fixed it.)

typetools / checker-framework

Collection.removeAll and retainAll can throw NullPointerException even if annotated conservatively #3197