jspecify / jspecify

An artifact of fully-specified annotations to power static-analysis checks, beginning with nullness analysis.
http://jspecify.org
Apache License 2.0
551 stars 29 forks source link

what happens to type variables when @NullnessUnknown (and @NotNull) type parameters are instantiated? #45

Closed kevin1e100 closed 4 years ago

kevin1e100 commented 5 years ago

Capturing a comment thread here

Let's say there's a class Foo that's not annotated at all, ie., T's effective specifier is "unknown nullness":

public class Foo<T> {
  private T value;
  T getOrThrow() { return checkNotNull(value); }
  T getOrNull() { return value; }
  set(T value); { this.value = checkNotNull(value); }
}

Now in some other Java file you do:

@DefaultNotNull
public class Bar {
  String baz(Foo<String> x) {
    return x.getOrNull();
  }
}

Meaning x has the fully explicit type @NotNull Foo<@NotNull String>. So when considering method calls on x, should we (unsoundly) assume that getOrNull() will return a @NotNull String? That set's parameter is a @NotNull String?

@cpovirk points out:

I commented on this somewhere else, saying that it was pretty scary: Even if Foo is @DefaultNullnessUnknown, then sure, Bar's Foo<String> can mean Foo<@NotNull String>. But I hope that doesn't mean that x.getOrNull() is considered to return @NotNull String. I'm hoping that @NotNull gets trumped by the effectively @NullnessUnknown T return type of getOrNull(), per:

"If the bound’s effective specifier is @NotNull or @NullnessUnknown then that is the type variable usage’s effective specifier."

I wrote the sentence quoted above but I don't think I anticipated this application of it. We could possibly do it that way but I think we'll want to be sure either way.

In general I believe the mental model I've been working with is that when a type parameter gets instantiated, the instantiation gets "pasted" everywhere the corresponding type variable is used. So T getOrNull() becomes @NotNull String getOrNull(). I think that's what you want to do if T is declared as <T extends @Nullable Object> But if T is bounded by @NullnessUnknown, as in the above example, then we could indeed just consider occurrences of T to have unknown nullness no matter how T is instantiated. And while the language quoted above just discusses type variables, and we've been trying not to specify implementation checking and what happens on method calls, it does seem like we want to formalize what Foo<@NotNull String> means in terms of methods declared in Foo.

Playing with IntelliJ's code completion popups, it seems that Kotlin, given x: Foo<String> indeed types x.getOrNull() : String!, i.e., with unknown nullness as @cpovirk would like. Given y: Foo<String?>, it types y.getOrNull() : String?, i.e., as @Nullable (which also seems sensible but i'm not sure how to specify that).

I'll note that the inverse situation also creates headaches. What I mean is, if Foo was instead declared class Foo<T extends @NotNull Object>, and given a variable Foo<@NullnessUnknown String> z, then should we expect z.getOrNull() to return a @NotNull String or a @NullnessUnknown String? Should we expect z.set(expr) to expect a @NotNull String argument or @NullnessUnknown String? I think the answer here is that it's sensible to use @NotNull String for both method result and method parameter type, though it is weird to get any @NotNull even though we had Foo<@NullnessUnknown String> z. Especially considering that if Foo was declared class Foo<T extends @Nullable Object> then we presumably would expect the same z.getOrNull() to return @NullnessUnknown String.

Maybe the asymmetry pointed out in the last example is ok. I've been trying to avoid that very asymmetry with wildcards, but if we consider it ok here then I think we may need to revisit it there for consistency.

stephan-herrmann commented 5 years ago

In this example Eclipse warns at the field Foo.value:

The field value may not have been initialized, whereas its type 'T' is a free type variable that may represent a '@NonNull' type

IOW, a type parameter with no nullness constraints must be used with special care. Eclipse also has a separate option to tune the severity of such "pessimistic analysis for free type variables".

kevin1e100 commented 5 years ago

@abreslav I believe mentioned there's a writeup of instantiation rules. Any chance you could share that (or paste them into the proposal doc)?

kevinb9n commented 5 years ago

I believe he means this: https://docs.google.com/document/d/1J44paXrplWzpCsXFpOobeBz_yiVH0W_yfgZR4sJsUfU/edit#bookmark=id.kmx5jvwou7pl

abreslav commented 5 years ago

Did some analysis in the issue linked above: #57

stephan-herrmann commented 5 years ago

Let's say there's a class Foo that's not annotated at all, ie., T's effective specifier is "unknown nullness":

I disagree, and that's why I filed #62. I would really want to read T as unconstrained in this situation, not unspecified! Otherwise all collection types from the JRE, e.g., could only be instantiated with type arguments of unspecified nullness, which would block a lot of useful analysis.

cpovirk commented 5 years ago

I think that, because we're talking about the spec, our discussion have focused to an almost distorted degree on "unknown nullness." In practice, most unannotated types that users see, especially those from java.*, will come with stubs / external annotations / specifications / something. That allows the checkers to treat ArrayList<E> as ArrayList<E extends @Nullable Object> but Optional<T> as Optional<T extends @NotNull Object>. I see stubs as the "real" solution to unannotated code (well, besides the solution of actually annotating the code directly when possible :)), with @NullnessUnknown as more of a fallback because our checkers have to do something besides crash or give up completely.

Stubs have the advantage that we can catch problems like Optional<@Nullable String>, which we'd permit (albeit with a warning) if the type parameter of Optional had a bound whose nullness was unknown.

All that said, I think I was confused (and thus misleading) in what I said above. I'm going to reread it and see if I can do any better.

cpovirk commented 5 years ago

(Sorry, reading what I just wrote below, I'm unsure whether it's much help or not. I'm posting it, anyway, partially so that I have it for my own reference. But feel very free to skip reading it.)

OK, I think that what I said does align with the spec text, but all the credit there goes to the spec, because my mental model wasn't very well defined before I read it.

The way part of me thinks of it is -- again, keeping in mind that the real checkers won't see Map this way because they'll have stubs -- that an unannotated Map class would effectively look like this to the checker:

public interface Map<K extends @UnknownNullness Object, V extends @UnknownNullness Object> {
  @NullnessUnknown V put(@UnknownNullness K key, @UnknownNullness V value);
  ...
}

In other words, part of me thinks of "undefined nullness" as "pasted" not only onto each type variable bound but also onto each type parameter use. (I mean this in a way contrary to what the spec says, even though the spec says something sort of like this. But I endorse the spec; I am just explaining how I originally thought of this in case that somehow helps someonoe.) That would mean that, even if the spec didn't have its current specific rule that says to make every usage of K and V have unknown nullness (on account of their bound), then no matter what type Map is instantiated with, the return type and parameter types of put would have unknown nullness, thanks to the "pasted" @UnknownNullness on those type-parameter usages.

I guess my thinking is as follows: In the extremely implausible event that Map.java were annotated as follows...

@DefaultNotNull
public interface Map<@UnknownNullness K, @UnknownNullness V> {
  @Nullable V put(K key, V value);
  ...
}

...then it seems reasonable for put on a Map<@NotNull K, @NotNull V> to require @NotNull inputs. After all, the type usages are annotated (by the default annotation); the only uncertainty that we're declaring is over whether K and V should be permitted to be @NotNull or @Nullable in the first place. Checkers might well warn about that at the instantiation site, but once they've done that, they've done their job: They don't need to warn about usage sites, too.

But, again: I do prefer the rules that the spec currently has. I think they work well in practice because they treat a completely unannotated type as if all its parameter and return types, whether they use type parameters or not, have unknown nullness. And that's what I really want, since it produces suppressible warnings for cases the checker is unsure about. If users want more than that, they should annotate the type directly or in stub files.

kevinb9n commented 5 years ago

Let's say there's a class Foo that's not annotated at all, ie., T's effective specifier is "unknown nullness":

I disagree, and that's why I filed #62. I would really want to read T as unconstrained in this situation, not unspecified! Otherwise all collection types from the JRE, e.g., could only be instantiated with type arguments of unspecified nullness, which would block a lot of useful analysis.

I think interpreting T as unconstrained very likely means that if we were analyzing that (e.g. JRE) class we would probably find violations. Since we're not analyzing that code and those violations aren't being reported, that means the problems they relate to might show up somewhere else.

Well, surely parameterizing with a nullness-unspecified type argument should be safe and normal. But parameterizing with anything else seems like it should make you suppress a warning to indicate that you're aware of the risk. Of course, to the degree that's annoying, that's supposed to motivate users to bite the bullet and get/provide the right stub files, and that seems kinda working-as-intended to me at the moment.

kevin1e100 commented 5 years ago

I suspect there's some kind of disconnect here, specifically around how a type parameter in a legacy file (i.e., a type parameter of "unknown nullness") is different from a known-parametric type parameter (here often referred to as a type parameter whose upper bound is a nullable type, but the idea is just a type parameter that where it is somehow (typically through some form of annotations) known that it can be instantiated with either a nullable or a non-null type, e.g., the type parameter of j.u.List<T>).

The former will commonly appear in legacy .java files where no-one has take the time to provide any annotations; let's imagine a fictitious Legacy<T> class for that. The latter will commonly appear in .java files that have annotations, for instance, after someone annotated j.u.List, either externally or directly in the source file, so let's assume someone annotated List and we use that as an example of an interface with a known-parametric type parameter in the discussion below. Finally, let's assume someone also annotated j.u.Optional<T>, but (let's assume) its annotations limit T to instantiations to non-null types (which would make sense in this case I believe).

Now what's the difference? I believe the desire in both cases is that they can be instantiated with type arguments of any nullability. That is, L<@NotNull String>, L<@Nullable String>, and L<@NullnessUnknown> should all be possible for both L = Legacy and L = List. Why? For List that's the explicit intent, and for Legacy, well, we don't know any better, so it should probably allowed (though one could consider generating warnings). (Conversely, only Optional<@NotNull String> should clearly be allowed, while Optional<@Nullable String> should probably result in some kind of "illegal type" error, and Optional<@NullnessUnknown String> also feels suspect if not outright wrong.)

But on the other hand it feels like something is fishy about just instantiating Legacy<T> with anything. That's what @cpovirk referred to as "pretty scary" in the doc comment quoted above. The issue is that because Legacy hasn't been annotated, we don't know if any of the uses of its type variable are actually parametric with how T was instantiated. For instance, Legacy may define a method T getOrNull() that could always return null, even when called on a Legacy<@NotNull String>. It may define a method setOrClear(T) that could always accept null. Or it could have methods that never accept or return null in places where the type variable T is used.

Because of that it's problematic and afaict not "typesafe" to just "paste" any type argument where T appears in Legacy's API. What I mean by that is, it's not safe to assume that a call x.getOrNull() on a variable declared Legacy<@NotNull String> x will return a non-null string, for instance. On the other hand, doing that for y.get(0) on List<@NotNull String> y is ok. (Similarly, btw, it's problematic / unsafe to let List just return null from get(int), since the type parameter allows non-null types. Conversely, in Legacy's method bodies we again don't really know what's ok and what isn't, though we probably won't run a checker over unannotated Legacy's implementation.)

One way of dealing with this is to consider all uses of Legacy's type variable in Legacy's API as having "unknown nullness" regardless of how Legacy's type parameter is instantiated, which is what this issue was originally filed about. Doing that allows tools to raise warnings when dereferencing unknown nullness method results coming from Legacy, and to raise warnings when passing nullable values into Legacy method parameters. Conversely, with List, checkers should likely "paste" any type argument's nullness into any type variable use in List's method parameters or results (unless they were further annotated). And with Optional, its type variable uses can just be considered to represent non-null types (again unless further annotated).

Nonetheless I want to emphasize that both known-parametric (here often call @Nullable-bounded) type parameters and type parameters "of unknown nullness" (typically appearing in unannotated legacy types) can be instantiated any which way, but at least for a sound checking tool it'll be important to nonetheless treat those instantiations differently: trust them when the type parameter is known parametric, but use "nullness unknown" everywhere if the type parameter itself is "of unknown nullness".

One source of confusion here, and I'm still not sure I'm explaining all this very well, is that the term "unknown nullness" is decidedly ambiguous when used for type parameters since "unknown" could refer to a known-parametric type parameter, where at the declaration site we don't know whether it'll be instantiated with a nullable or non-null type. That is generally not what I have meant when I've talked about type parameters with unknown nullness. Instead I was referring to situations like Legacy as described above.

stephan-herrmann commented 5 years ago

Much of what @kevin1e100 writes here, gives a nice motivation for #62 and I concur in many regards.

What sounds strange to me is the proposal to (a) allow instantiating the legacy type parameter with any nullness, but (b) ignoring the instantiation in so far that references to that type variable are still seen as unspec'ed types. As a user who instantiates Legacy<@Nonnull String> I would feel cheated if that annotation has no positive effect. At least I expect some feedback whether I'm using this instantiated type in a consistent way, e.g., passing null to a parameter of type T should be flagged.

I know this is a muddled area, and the question how to distinguish Legacy from List is a tough one. In my GenericsEclipsePOV essay I outlined the heuristics employed by Eclipse for deciding whether a T return type can be trusted to support the contract of a nonnull instantiation. It's not perfect but for this migration / legacy topic it helps to put the blame where it belongs without causing trouble in the core semantics.

cpovirk commented 5 years ago

Thanks. I came here after reading your email and essay, but thanks for the points here, as well.

In my original comment, I was concerned about the bad side of trusting Legacy<@NotNull String>: Checkers would assume that a String that comes out of the object is non-null (but more on that later). I had completely overlooked what you point out, which is that we can offer a good side: Checkers would force callers to put in only non-null strings. I can understand feeling cheated if this checking doesn't happen.

(From that perspective, there's some sense in allowing even Legacy<@Nullable String>, which lets users opt in to stricter checking of values coming out of the object (but then also see looser checking of values put in). But as you point out in your email (which I'm still processing, along with your essay), we already have conflicting requirements for the hierarchy of nullnesses. Permitting Legacy<@Nullable String> for Legacy<T extends @UnknownNullness Object> would make things worse.)

On the other hand, I still don't love the possibility that Legacy<@NotNull String> would usually effectively mean "Set T to @NotNull String, and also trust that no occurrences of T in the Legacy API should have had overriding nullness annotations." I think that part of what bothers me is the non-local-ness: A suppression on the declaration effectively affects the use sites. Another part may be that I usually expect a suppression to apply to an expression, not to a type. So I might see @SuppressWarnings("nullness") Legacy<@NotNull String> foo = client.getFoo(); and think that the problem is that client.getFoo() doesn't declare nullness, when in fact the problem is that Legacy doesn't declare nullness.

I understand that Eclipse has heuristics to catch the problems this would otherwise create. I'm wondering if the way to go for both of us, though, is to embrace unknown nullness by getting in the habit of writing Legacy<@UnknownNullness String>. That will produce warnings in similar cases(?) to Eclipse. And I might not care too much what instantiations are possible (since we have enough difficultly already with the hierarchy of nullnesses), given that there's one that produces the behavior that I want.

I guess my main remaining concern is whether Legacy<@NotNull String> (typically written as just Legacy<String>) will produce a noisy enough warning for me to notice it. I can imagine running in "lenient" mode (i.e., with most unknown-nullness warnings suppressed) a lot of the time, but the danger of interpreting Legacy<@NotNull String> as discussed a couple paragraphs ago feels somehow worse to me than a typical nullness warning. But does that really justify making it an error? (And would a suppressible error be better or worse?) Or maybe I just need to get over the idea that it's a "worse" kind of warning.

[edit: a late remark on non-local-ness: Many suppressions of nullness warnings can create a NullPointerException later, so we could call those "non-local," too. (In that way, they're just like unchecked warnings.) I still see this case as sort of different, since it's effectively disabling what would otherwise be nullness warnings elsewhere in the code. But I grant that actual runtime effect is the same.]

kevin1e100 commented 5 years ago

Just to clarify, my previous comment was mostly to say, in a lot of words, that there seem to be benefits from distinguishing "unknown" (aka unspecified,legacy,whatever) from known-parametric (and known-non-parametric) type parameters.

One benefit of doing that is that it seemingly allows to more precisely define the heuristics that should be used for "unknown" type parameters, and when to use the heuristics. There's a strong desire to leave those heuristics up to individual checkers, but by defining which type usages (including uses of type variables) have "unknown" nullness we identify where those heuristics should (likely) apply. If a particular tool's heuristics include issuing warnings when a nullable string is passed into a Legacy<@NotNull String> method, and not warning when dereferencing method results coming from such a thing, that's fine as far as I'm concerned. (Though I believe it's not always "safe" to go by these particular heuristics, and they may be noisy sometimes, so some tools may not want to use them. kotlinc for instance uses a slightly different approach, as an example.)

With that in mind I think it does make sense to assign "unknown" nullness to usages of type parameters which are themselves "unknown" (rather than known-parametric), though I struggle with this myself (see my initial description of this issue), and there could be other ways of achieving this. But I think the intuition is this: if we don't know anything about the type parameter then we probably also shouldn't blindly assume that usages of that type parameter in method parameter and return types are parametric (certainly doesn't seem safe to do so): any given use may be parametric (like List.get), but it seems like it can also always be nullable (like Map.get), or always non-null, and we just have no way of knowing that in a legacy class, unless I'm missing something. Conversely, if someone took the time to mark a type parameter as parametric (or non-parametric) then it seems reasonable to assume that they also went over the usages of that type parameter and annotated any occurrences that need it. In other words, once a type parameter is marked as parametric we can assume any usages to be parametric unless annotated, and it seems safe to do so.

cpovirk commented 4 years ago

Copying over some discussion from #74, which I've closed, since it's about one specific case of the general rule discussed here. (Specifically, it's about the case that I hadn't thought of in my original comment quoted in the first post here.)

In #74, I wrote:

(Previously I gave an example predicated on using @NullnessUnspecified for Map.get. But there may be less support for that than I'd thought, and anyway, I think this is worth considering independent of that.)

Any generic type that isn't annotated for nullness is likely to have a method that we consider to return @NullnessUnspecified T. For example[*]:

class Supplier<T> {
  T get();
}

It seems pretty clear that, if I write Supplier<@Nullable String> (probably receiving a warning), I expect for get() to return a @Nullable String.

But the design proposal currently has the following rule:

If the bound’s effective specifier is @NotNull or @NullnessUnknown, then that is the type variable usage’s effective specifier.

So get() returns a @NullnessUnknown String. That's still pretty good, but I can still imagine that users would feel "cheated" here that we didn't respect the @Nullable annotation.

The purpose of the current rule is, as I understand it, more about @NotNull. We don't want to assume that an unannotated method that returns T is meant to return exactly T as opposed to @Nullable T. (@Nullable T, incidentally, would be the correct return type for an almost identical-looking get() method, the one on java.lang.ref.Reference.)

Maybe what we really want is a rule about picking the "most nullable" of the nullnesses that are "in play." The type parameter bound says "unspecified," but the instantiation says "nullable," so we go with nullable.

However, such a rule makes sense only for unspecified nullness, not in general: If the type parameter bound says "nullable" but the instantiation says "not null," then we still want not null. We could have a special case for unspecified nullness (as arguably we already do), but it would feel nice for it to be less of a special case.

This brings me back to an idea I mentioned once before (in the last paragraph of this post): What if, when determining the nullness of a type-variable usage, we didn't look at the bound at all? Then, to make unspecified nullness work the way we want it to again, we treat defaults as also applying to type-variable usages -- but only in combination with the type parameter that the class is instantiated with (by picking the "most nullable").

(This better fits my mental model for parametric nullability in unannotated classes, and it also produces a better (IMO) result for incrementally annotating a type, as discussed in bullet 2 in a post I linked earlier.)

I'm still not sure of the right way to communicate all this. I feel like it wouldn't be significantly more complex than the existing proposal once it's written in definition form. But people would need an intuitive feel for it, too, and I'm not sure I'm getting the intuitive model across.

[*] edit: Originally I used Reference as my example throughout, rather than using Supplier at the beginning and switching to Reference briefly later, as I now do. The problem with using Reference at the beginning: I would probably annotate Reference as Reference<T [extends @NotNull Object>, forbidding Reference<@Nullable Object>. (The class can still support all callers because its constructor would accept a @Nullable T, and its get method would return a @Nullable T.)

cpovirk commented 4 years ago

And @kevinb9n replied:

Any generic type that isn't annotated for nullness is likely to have a method that we consider to return @NullnessUnspecified T. For example: class Reference<T> { T get(); } It seems pretty clear that, if I write Reference<@Nullable String> (probably receiving a warning), I expect for get() to return a @Nullable String.

So far I agree: you probably do expect that, and you probably should get a warning, and the question of whether you will get what you expect or not after ignoring/suppressing that warning is probably unanswered. (I believe you should, because (a) look, you were warned, and (b) otherwise the rules for interacting with that reference are different depending on the foreign definition of Reference, and we don't like that.)

But the design proposal currently has the following rule:

If the bound’s effective specifier is @NotNull or @NullnessUnknown, then that is the type variable usage’s effective specifier.

So get() returns a @NullnessUnknown String.

I believe this part of the spec is only talking about type usages of the type variable itself (which can only occur inside the scope of the Reference class). When analyzing that class itself, T would be seen as having unspecified nullness.

Reference<@Nullable String> is a parameterization of Reference and has substituted away all occurrences of T, so it's our defined substitution rules that will govern what nullness the get() return type has. As argued above I think @Nullable should probably win.

So let me pause there and ask you whether that changes the rest of what you want to say here.

cpovirk commented 4 years ago

First of all, again, thanks. @kevin1e100 had already tried to explain this to me in the "pasting" model described above, but I had still been under the impression that there was a special case for @NullnessUnspecified, and I forgot to circle back here. But now I understand why @kevin1e100 "didn't anticipate" this application of the rule, which previously appeared to me to be exactly the intended application of said rule :)

I would like to make the case that the spec is unclear on this:

I should acknowledge that this behavior is in fact "obvious." But I think it's also obviously dangerous, as discussed in the first post here. Given that, when I saw a phrase in the doc that appeared to be solving the problem, I jumped on it.

(Aside: In the long term, it may be useful for our docs to acknowledge the cases in which we've set rules that we recognize to be unsound.)

We can try to make the doc clearer, but before we rush to do that, I want to make the case for different behavior (and also say another thing or two, to no one's surprise :))

cpovirk commented 4 years ago

For the actual most direct discussion of this, we have to go to the glossary. It discusses "substitution" (which perhaps the design proposal should mention explicitly), and it distinguishes between "member of a class" and "member of a type," with the latter applying substitutions. The glossary emphasizes that a member of a type is "real... despite not having been defined explicitly," but I think what we need is a sharper distinction between which rules act on members of the class and which act on members of the type. In particular, I doubt that any one rule acts on both. (Hmm, except maybe that static methods aren't considered to be members of types, yet I'd expect some rules that apply to members of types to apply to static methods, too.)

Relatedly, I know that @kevinb9n recently added the idea of "inherited members" (of a class) to the glossary. I spent some time trying to form an opinion on whether performing substitution on these members -- an operation we don't perform until we switch to looking at members of a type -- is the way to go. I want to say that this follows the JLS (8.4.8, 4.5.2, 8.1.4). But I still have another form of my previous confusion: When do we inherit members from a class instead of from a type? I will try to think about this more. My instinct is to transition from talking about members of classes to talking about members of types ASAP in general.

cpovirk commented 4 years ago

In case the actual behavior differences between my proposal and @kevinb9n's interpretation are unclear, here's an example, in which I've tried to hit all the practical differences between the two.

// Here's a simple class:

@DefaultNotNull
/**
 * Contains a non-null value, or is empty (which <i>some</i> of its APIs treat as
 * "null-like").
 */
class BoxOrEmpty<T> {
  @Nullable T getOrNull();
  T getOrThrow();
  void setOrClear(@Nullable T t);
  void set(T t);
}

// Now suppose that it's unannotated instead. It would look as follows:

[@DefaultNullnessUnspecified]
class BoxOrEmpty<T> {
  T getOrNull();
  T getOrThrow();
  void setOrClear(T t);
  void set(T t);
}

// [edit: "MODEL 1"]
// Now let's instantiate it under my proposed rules. I've left a comment on every line
// where the substituted type is different from what it would be with the fully annotated
// class.

// If we instantiate it with `T = @NotNull String`, every `T` becomes
// `@NullnessUnspecified String`.
BoxOrEmpty<@NotNull String> box;
box.getOrNull() // warning should be error; would prefer @Nullable
box.getOrThrow() // unnecessary warning; would prefer @NotNull
box.setOrClear(...) // unnecessary warning; would prefer @Nullable
box.set(...) // warning should be error; would prefer @NotNull

// If we instantiate it with `T = @Nullable String`, every `T` becomes `@Nullable
// String`. Note that this instantiation would PRODUCE A WARNING, since we can't verify
// that the type was meant to be instantiated that way. (And in fact it's NOT intended to
// be.)
BoxOrEmpty<@Nullable String> box;
box.getOrNull()
box.getOrThrow() // [edit: unnecessary error; would prefer @NotNull]
box.setOrClear(...)
box.set(...) // unsound; would prefer @NotNull

// The only way to get NPE is to call a method that issues a warning (or error) or, in
// the case of box.set(...), to have instantiated the type in a way that produced a
// warning.

// [edit: "MODEL 2"]
// Now let's instantiate it under the assumption that every usage of `T` always takes on
// the nullness of the type argument used in instantiation. (This follows @kevinb9n's
// reading of the doc.)

// If we instantiate it with `T = @NotNull String`, every `T` becomes `@NotNull String`.
BoxOrEmpty<@NotNull String> box;
box.getOrNull() // unsound; would prefer @Nullable
box.getOrThrow()
box.setOrClear(...) // unnecessary error; would prefer @Nullable
box.set(...)

// The results of the instantiations with `T = @Nullable String` match mine.

// Pros of always using the nullness from the type argument (i.e., @kevinb9n's reading):
// - There are fewer "bad" cases. (Not pictured: the case of `Pointer<T extends @Nullable
//   Object>`, instantiated with `T = @Nullable String`, in which both sets of rules
//   agree on the types but my proposal issues an unnecessary warning.)
//
// Cons of always using the nullness from the type argument (i.e., @kevinb9n's reading):
// - The consequences of the "bad" cases are worse: To ensure that NPE is possible only
//   when issuing a warning (or error), we would have to issue a warning for
//   instantiating with either @NotNull OR @Nullable. We'd have to say that the only
//   warning-free instantiation is BoxOrEmpty<@NullnessUnspecified ...>. We can do that,
//   but:
//   - It may discourage users from declaring the type the way it "should" be. Then, if
//     BoxOrEmpty is subsequently annotated, users might not update their usages.
//   - It may discourage users who know that @Nullable is the right type from getting
//     enforcement with errors, rather than just warnings.
//   - [edit: It feels especially weird to warn about instantiating with @NotNull when
//     that's likely to *always* be a permitted way to instantiate a type. It feels more
//     like we're warning the user that *we* did something wrong, not that the user did.]
// - Instantiating with @NotNull can produce an unnecessary error (not just a warning).
// - [edit: It's possible that our general rules for @NullnessUnspecified will force us
//   to generate a warning for BoxOrEmpty<@NullnessUnspecified ...>, even if we'd rather
//   not.]

//////////////////////////////////////
//////////////// EDIT ////////////////
// in response to @kevin1e100 below //
//////////////////////////////////////

// [edit: "MODEL 3"]
// Finally, let's instantiate it under the assumption that usages of `T` remain
// `@NullnessUnspecified`. (This follows my original interpretation of the doc, quoted at
// the beginning of this thread.)

// The results of the instantiations with `T = @NotNull String` match mine.

// If we instantiate it with `T = @Nullable String`, every `T` becomes
// `@NullnessUnspecified String`.
BoxOrEmpty<@Nullable String> box;
box.getOrNull() // warning should be error; would prefer @Nullable
box.getOrThrow() // warning should be error; would prefer @Nullable
box.setOrClear(...) // unnecessary warning; would prefer @Nullable
box.set(...) // unnecessary warning; would prefer @Nullable
kevin1e100 commented 4 years ago

So I think in this example at least, your proposal ends up behaving like kotlinc. I'm confused by these comments though:

// Now let's instantiate it under the current rules in the proposal doc. // If we instantiate it withT = @NotNull String, everyTbecomes@NotNull String. // The results of the instantiations withT = @Nullable Stringmatch mine.

Aside from the fact that this issue was filed because the proposal doesn't explicitly spell out how to treat methods of an instantiated type like BoxOrEmpty<@NotNull String>, the proposal seems to currently imply that every T becomes @NullnessUnspecified String regardless of instantiation (assuming BoxOrEmpty is unannotated). But the above comments say something different, so I'm worried that I am lost and unclear what the comments refer to.

I'm claiming the proposal implies @NullnessUnspecified because of the wording "If the [type parameter's] bound’s effective specifier is @NotNull or @Nullness[Unspecified], then that is the type variable usage’s effective specifier [unless the type variable usage is explicitly annotated]".

To reiterate an earlier point, this was written to reason about method declarations, to give tools a way to compare a method's implementation code with the declared parameter and return types. But, I see more clearly now that we also need to specify what happens once type parameters are instantiated, at the very least so we can specify overriding rules.

But we could maybe say something like this to get the effect claimed above: "effective nullability specifiers of Members of a Parameterized Type derive from the effective nullability specifiers inferred for the underlying Generic Class, with parametric nullability specifiers replaced by the effective specifiers of the respective Type Arguments of the Parameterized Type". (This is very dense: glossary terms in the previous sentence start with Capital Letters and terms defined in the proposal are in italics to hopefully help navigation.)

I'm really not sure that's the right way of going about this, but this does seem consistent with the current wording cited above and it does seem to allow sound reasoning while letting "lenient" tools ignore a lot of potential problems (possibly to their detriment).

An alternative rule would be to always substitute all type variables with their corresponding type arguments including their effective nullability specifiers. That's the rule that seems the easiest to understand, but it appears unsound. Another possible rule, consistent with the previous post I believe, is to substitute the weaker of the type argument's and the type variable's effective nullability specifiers. Note that all three rules only effectively differ in how they treat @NullnessUnspecified T type variables (!), which is why I had hoped we could leave this up to tools.

cpovirk commented 4 years ago

I'm happy to hear that my proposal matches kotlinc here. I like the Kotlin nullness model, and I like having the evidence that their model can be implemented.

I'm confused by these comments though:

// Now let's instantiate it under the current rules in the proposal doc. // If we instantiate it withT = @NotNull String, everyTbecomes@NotNull String. // The results of the instantiations withT = @nullable Stringmatch mine.

Aside from the fact that this issue was filed because the proposal doesn't explicitly spell out how to treat methods of an instantiated type like BoxOrEmpty<@NotNull String>, the proposal seems to currently imply that every T becomes @NullnessUnspecified String regardless of instantiation (assuming BoxOrEmpty is unannotated). But the above comments say something different, so I'm worried that I am lost and unclear what the comments refer to.

Sorry, I am continuing to be confusing as I try to work through this. I've tried to edit my post to clarify, and I'll reply directly here.

As you go on to say, there are at least 3 models on the table for T extends @NullnessUnspecified. Sweeping some details under the rug, they are:

  1. My proposed model: The nullness of @Foo T after instantiation/substitution is the "most nullable" value between @Foo and the nullness T is instantiated with.
  2. The way that you and @kevinb9n appear to have originally interpreted the spec (the Many Kevins Interpretation): The nullness of @Foo T after instantiation/substitution is @Foo, or, if there is no @Foo, then it's the nullness T is instantiated with.
  3. The way that I interpreted the spec, which you have said that it at least "implies": The nullness of @Foo T after instantiation/substitution is @Foo or, if there is no @Foo, then it's @NullnessUnspecified. (There is also model 3.5, the Eclipse model, which I believe is somewhat similar to 3 but differs in a few respects: It applies only when instantating with @NotNull and only to return types that are naked type variables. (This is "Dealing with legacy" in GenericsEclipsePOV.) In other respects, it's like model 2, and maybe you can squint to see a little of model 1 in that it treats @NotNull and @Nullable differently.)

Above, I was talking only about model 1 and model 2. Why? Well, I like model 1, for starters :) Then, I took @kevinb9n to be advocating for model 2 in the post I moved/quoted above. (It's possible that he was simply explaining, not advocating for any particular resolution.) Model 3 still strikes me as reasonable, but I think I prefer model 1 -- unless maybe model 3 ends up easier to implement or explain. But I haven't tried to precisely specify any of them. Speaking of which:

But we could maybe say something like this to get the effect claimed above: "effective nullability specifiers of Members of a Parameterized Type derive from the effective nullability specifiers inferred for the underlying Generic Class, with parametric nullability specifiers replaced by the effective specifiers of the respective Type Arguments of the Parameterized Type".

(Note to self: When I first read this, I managed to forget the key piece of information we've been discussing this whole time: T extends @NullnessUnspecified is not considered to have parametric nullability. In my proposed model, it ends up "half" parametric: In cases like analyzing what an implementation can do with a T parameter or return type, it's not parametric. In actual substitution, it arguably qualifies as "parametric." At the very least, it's subject to being replaced in the way that parametric types are in the paragraph above.)

This does sound like model 2 to me. Thanks for writing it up.

Note that all three rules only effectively differ in how they treat @NullnessUnspecified T type variables (!), which is why I had hoped we could leave this up to tools.

Yeah! I do think it's important for us to say something here as a practical matter, though I don't object to making it (and everything else we say, really) more of a "recommendation" if that helps. (This touches on #73.)

kevinb9n commented 4 years ago

Here's my current attempt at high-level overview of this issue, though I am far from having fully digested everything discussed in here:

The premise here is class Foo<T extends @NullUnspec SomeType> (irrespective of whether explicit, via defaulting or default-defaulting), and to close this issue we will need to decide:

  1. What is the treatment (terminology, behavior, etc.) of type usages of the type variable T itself inside that scope (annotated or not)?
  2. What type arguments will be legal/illegal/"warning"?
  3. Given a type argument, how does substitution operate on type usages of T?

I think if we answer those we should be done. There is a possible fourth question, "Will the resulting parameterized type have 'asterisks' attached, that result e.g. in warnings later on, whereas the same parameterized type arising from different circumstances might not?" But I think we should rule that out right now. The asterisk is too weird. That still wouldn't preclude checkers from deciding to issue whatever additional warnings for whatever additional reasons they might want to.

It is likely that the behavior we end up with won't spark joy for anyone; the premise is that there is critical information missing, which we would need to produce good results, and the user has the option of providing that information e.g. via a stub file. They can improve their lot that way, but we just need some reasonable approach to cover the case they do not.

Addressing these 3 points (again, for class Foo<T extends @NullUnspec SomeType>):

  1. Postpone: This probably is best decided after #s 2 and 3, since the main reason to make one decision vs. another is whether it causes trouble down the road for usages.

2a. I think Foo<@NullUnspec Bar> should be fine (maybe subject to #69).

2b. Foo<@Nullable Bar> is of course generally unsafe. We could declare it to be illegal, but I think we're exploring the possibility of a useful intermediate approach where it generates a warning, prompting the developer to check her assumptions before suppressing and moving on. And that seems fair, and like it's maybe a small ramp for some of the adoption cliffs. By suppressing (or, sigh, ignoring) the warning, the developer is in effect saying "T could safely be marked as <T extends @Nullable SomeType> except for some (possibly empty) set of issues all of which I am taking responsibility for steering clear of now." Note that this warning should probably not be suppressed with the "ordinary" lenient-v.-strict-mode suppression string, but require a special one, to reflect that this case has greater nonlocal consequences (maybe).

2c. Foo<@NotNull Bar> is perhaps the same situation as 2b, except that the developer is either asserting that T's bound could have been marked @Nullable or asserting that it could have been marked @NotNull. But I think it works out the same. Maybe.

  1. As for substitution, it may be unlikely that there are many type variable usages of T within Foo that are annotated (since this is probably an entirely unannotated file), but if there are, then I don't see why we shouldn't respect the explicit annotations and let them fully override everything else. That reason might be somewhere up above in this thread, but I haven't processed it all yet. For unannotated appearances of T, if we accept my description above of the sort of assertion the developer is trying to make, then I think that supports letting the nullness of the type argument win. I think this is model 2 in Chris's comment?
kevinb9n commented 4 years ago

In addition to 2a/2b/2c I guess we need to consider all the wildcard possibilities too.

kevinb9n commented 4 years ago

(It's interesting that the concept that non-nullable <: unspec <: nullable, which seems so useful in checking ordinary type conversions, seems to be an almost useless tool in hammering this one out. Trying to use that guidepost would suggest that extends @Nullable is the way to say that all three argument nullnesses are okay, extends @NullUnspec means all except @Nullable are okay, and extends @NotNull means only @NotNull is okay. But in practice, that asymmetry of the middle case doesn't seem to make real sense. I think maybe this situation is a consequence of the inherent asymmetry of Java allowing <T extends> but not <T super>.)

[Edit: this topic now seems to be #76]

cpovirk commented 4 years ago

I think your issues 1-3 cover it. (People following along are likely aware that we have other issues about @NullnessUnspecified, like #33 and #69, but those are, well, other issues.)

There is a possible fourth question, "Will the resulting parameterized type have 'asterisks' attached, that result e.g. in warnings later on, whereas the same parameterized type arising from different circumstances might not?" But I think we should rule that out right now.

Agreed.

It is likely that the behavior we end up with won't spark joy for anyone; the premise is that there is critical information missing, which we would need to produce good results, and the user has the option of providing that information e.g. via a stub file.

Agreed again. @kevin1e100 proposed a couple relevant nice-to-have principles:

  1. Postpone: This probably is best decided after #s 2 and 3, since the main reason to make one decision vs. another is whether it causes trouble down the road for usages.

Agreed. This seems like the easy case, so we can adapt based on our decisions for issues 2 and 3.

2a. I think Foo<@NullUnspec Bar> should be fine (maybe subject to #69).

I haven't given this case a ton of thought. I probably should think about it more.

I definitely agree that it's not an error: This ties into "Running a checker in 'lenient mode' over unannotated files should succeed."

I do wonder if "should be fine" will mean "may generate a warning." This might fall out of https://github.com/google/codeanalysis-annotations/issues/33#issuecomment-538138533, for better or for worse -- if we go that direction. I'm not sure I want the warning in that case, but it might end up hard to avoid, especially in light of https://github.com/google/codeanalysis-annotations/issues/33#issuecomment-543327013 + https://github.com/google/codeanalysis-annotations/issues/33#issuecomment-543328528.

(As noted above, it seems likely that most tools will never be run in "strict mode." Maybe some tools won't even support generating warnings at all, instead silently permitting code with "warnings?" This might simplify implementation.)

2b. Foo<@Nullable Bar> ... we're exploring the possibility of a useful intermediate approach where it generates a warning.... Note that this warning should probably not be suppressed with the "ordinary" lenient-v.-strict-mode suppression string, but require a special one, to reflect that this case has greater nonlocal consequences (maybe).

(I am likewise unsure whether this warning would be special or not.)

2c. Foo<@NotNull Bar> is perhaps the same situation as 2b, except that the developer is either asserting that T's bound could have been marked @Nullable or asserting that it could have been marked @NotNull. But I think it works out the same. Maybe.

Foo<@NotNull Bar> is potentially different because it should always be a valid way to instantiate Foo, whether its T should have beeen T extends @Nullable or T extends @NotNull. (This assumes, though, that we're not permitting class Foo<@Nullable T> as a way of saying that Foo can be instantiated only with @Nullable types (#60). I personally see this more as a reason to discourage/forbid class Foo<@Nullable T> than a reason to warn about the instantiation Foo<@NotNull Bar>.)

But, if we follow model 2, then I do think we should recommend a warning here for soundness. This is a reason that I like models 1 and 3.

But but, if few people are going to run with warnings on in the first place, maybe there's no reason to care too much about this. Or maybe it's a reason to consider that special category of warning you mentioned in 2b, which checkers might want to implement even if they don't implement normal warnings? (But I guess checkers can pick and choose themselves even if we don't officially differentiate the two.)

  1. As for substitution, it may be unlikely that there are many type variable usages of T within Foo that are annotated (since this is probably an entirely unannotated file), but if there are, then I don't see why we shouldn't respect the explicit annotations and let them fully override everything else. That reason might be somewhere up above in this thread, but I haven't processed it all yet. For unannotated appearances of T, if we accept my description above of the sort of assertion the developer is trying to make, then I think that supports letting the nullness of the type argument win. I think this is model 2 in Chris's comment?

I think that all 3 models propose to respect the annotation in some way. Models 2 and 3 both do it exactly as you say. Model 1 deserves a longer treatment than I can give it here, but it's equivalent in the common case and different in some other cases.

[edit: Sorry, I ignored the heart of what you were getting at, and I plain misread this. I've posted a followup.]

In addition to 2a/2b/2c I guess we need to consider all the wildcard possibilities too.

I have mostly been assuming that they will magically be solved by our rules for type parameters. That may be optimistic.

It's interesting that the concept that non-nullable <: unspec <: nullable, which seems so useful in checking ordinary type conversions, seems to be an almost useless tool in hammering this one out.

Agreed. Given that we'd like "more" from unspec here and in #33 and maybe "less" from it in #69, I'm increasingly drawn to models that throw the straightforward rule out, focus on the difficult cases, and figure that the straightforward rule will fall naturally out of them. This ties into @mernst's framing (via @kevin1e100) of unspec as an existential type.

[edit: The straightforward rule may do more harm than good, since it gives the wrong answer here. That wrong answer is at least a contributor to some of the terminology discussion in, e.g., https://github.com/google/codeanalysis-annotations/issues/62 .]

I think maybe this situation is a consequence of the inherent asymmetry of Java allowing <T extends> but not <T super>.)

(I don't think I follow this.)

cpovirk commented 4 years ago

2b. Foo<@Nullable Bar>... By suppressing... the warning, the developer is in effect saying "T could safely be marked as <T extends @Nullable SomeType> except for some (possibly empty) set of issues, all of which I am taking responsibility for steering clear of now."

I see 2 distinct claims here, as you've already hinted:

  1. "If this type were fully annotated, Foo<@Nullable Bar> would be a valid way to instantiate it."

We agree that this might or might not be true, depending on the specifics of Foo.

  1. "If this type were fully annotated, the substitution rules for T in Foo<@Nullable Bar> would produce the same nullness values as they do for the unannotated type."

For Foo<@Nullable Bar> specifically, I think this is true in general -- or we should aim to make it true.

This is a stronger claim than you were making. To put my claim in your terms: Provided that Foo's type parameter should have been declared as <T extends @Nullable SomeType>, then there are no issues with instantiating it as Foo<@Nullable Bar> and using it, even in the absence of annotations on Foo.

I think this works out, at least under models 1 and 2: Foo's bare references to T will be interpreted as @Nullable. That is almost certainly correct, since we rarely see cases in which it makes sense to override a use of parametric T to be @NotNull. (And like I said above about #60, I see this more as a reason to discourage/forbid such overriding than a reason to issue warnings about later uses of the methods on the Foo<@Nullable Bar>. This is of course a larger issue (part of #4), which hopefully I will write more about tomorrow.)

This suggests to me:

Again, models 1 and 2 get this right. 3 gets it kind of backward, though at least the end result is still sound.

2c. Foo<@NotNull Bar> is perhaps the same situation as 2b....

  1. ... For unannotated appearances of T, if we accept my description above of the sort of assertion the developer is trying to make, then I think that supports letting the nullness of the type argument win. I think this is model 2 in Chris's comment?

I edited my previous post to reflect that I was wrong here: This is indeed only model 2.

As for Foo<@NotNull Bar> specifically, let's go back to my 2 claims:

  1. "If this type were fully annotated, Foo<@NotNull Bar> would be a valid way to instantiate it."

As I said in my previous post, I think that we should treat this claim as always true.

  1. "If this type were fully annotated, the substitution rules for T in Foo<@NotNull Bar> would produce the same nullness values as they do for the unannotated type."

I think we agree that this might or might not be true, depending on the specifics of Foo.

This suggests to me:

Models 1 and 2 differ in how they handle this. I like model 1 because it matches these bullets. Model 2 can still be sound, but it has to get it "backward": It won't warn at usage sites, so it has to warn when instantiating. (The 2 models differ in other ways, and I like additional things about model 1, but the difference I'm focusing on here is the one that's most relevant.)

kevin1e100 commented 4 years ago

(Thanks for your diligent edits, Chris, I think/hope I understand what you were getting at now.)

One other thing to mention (again) in this context is issue #62 which offers a way of sidestepping this problem by not considering unannotated type parameters as "extends @NullnessUnspecified", and instead consider them "extends @Nullable", IIUC (see the "unconstrained" item in the initial comment in issue #62).

This alludes to something that I think there's agreement on: when parameterizing an unannotated class, type arguments with any nullness should be allowed (maybe with warnings). That is, given an unannotated class Foo<T> {...}, both parameterizations Foo<@Nullable Bar> and Foo<@NotNull Bar> should to a first approximation be allowed.

I think in my mind I've been putting the soundness bar pretty (too?) high here: allow treating parameterizations of unannotated generic classes soundly regardless of what might go on in the unannotated generic class. It finally occurred to me today that that's more ambitious many sound modular checks, which often assume that they will have an opportunity to check all code, which would include unannotated code. I have in my mind associated unannotated code with "probably won't be checked" though, and concluded that we should have a way of still soundly analyzing uses of such unannotated code. That's another one of my "design principles" (Chris listed some others above), I guess, but I wanted to explicitly acknowledge it and recognize that it's a taller order than I had previously realized.

Anyway, I think I'm coming around to Chris's proposal. That is, assuming class Foo<T> is unannotated:

.... which is exactly what kotlinc already does, so it's win-win.

Am I missing something? Are we happy with this?

kevinb9n commented 4 years ago

@kevin1e100 proposed a couple relevant nice-to-have principles:

  • Running a checker in "lenient mode" over unannotated files should succeed.

Yes, I think this should follow directly from the fact that @NullUnspec as we define it communicates zero semantic nullness information. So if a checker fails for some reason, it cannot be due to information it got from our annotations, because there was no such information.

  • Our rules should permit a sound checker (even when compiling against unannotated code -- a differentiator relative to even the existing Checker Framework), even though most tools might never implement a "strict mode." We expect to decide not to be sound in various cases, but those should be conscious decisions.

I think whether we permit a sound checker will be the easy part if we stick to our guns on "we don't dictate checker behavior". The more interesting issue is whether our annotations provide close to as much information as those checkers need, or they need to layer a LOT more stuff on top.

2a. I think Foo<@NullUnspec Bar> should be fine (maybe subject to #69).

I definitely agree that it's not an error: This ties into "Running a checker in 'lenient mode' over unannotated files should succeed."

I do wonder if "should be fine" will mean "may generate a warning."

I would hope not, because it's the least risky way to parameterize such an unannotated type; the warnings will come from usages of that type.

(As noted above, it seems likely that most tools will never be run in "strict mode." Maybe some tools won't even support generating warnings at all, instead silently permitting code with "warnings?" This might simplify implementation.)

I'm nervous about making an assumption like that. I also think we shouldn't think of it as a global mode, but consider that people will suppress that type of warnings for whole files or directories when it's too much of a pain, but may be happy to leave it on in others and make at least a token attempt to satisfy the strict requirements spewing forth.

2c. Foo<@NotNull Bar> is perhaps the same situation as 2b, except that the developer is either asserting that T's bound could have been marked @Nullable or asserting that it could have been marked @NotNull. But I think it works out the same. Maybe.

Foo<@NotNull Bar> is potentially different because it should always be a valid way to instantiate Foo, whether its T should have been T extends @Nullable or T extends @NotNull.

But, if we follow model 2, then I do think we should recommend a warning here for soundness. This is a reason that I like models 1 and 3.

I think we certainly need a warning (of the possibly-"special" kind we were discussing), since any T-returning method might be returning null.

But but, if few people are going to run with warnings on in the first place, maybe there's no reason to care too much about this.

(Just reiterating that I feel it's not a good idea for us to think this way; let's discuss.)

It's interesting that the concept that non-nullable <: unspec <: nullable, which seems so useful in checking ordinary type conversions, seems to be an almost useless tool in hammering this one out. I think maybe this situation is a consequence of the inherent asymmetry of Java allowing <T extends> but not <T super>.)

(I don't think I follow this.)

Note that for class Foo<@Out T extends @NullUnspec Object> (#72) the subtyping relationship really does give us everything we need. We know that Foo<@Nullable Bar> is safe and we know that Foo<@NotNull Bar> is not, just as the subtyping indicates.

It's for types that consume that the problem comes in. If type params could have lower bounds, then consumer types would logically use those instead, and then when unannotated we would see it as <T super @NullUnspec bottom-type>, and I think generally the right things would happen based on the subtyping relationship above. But they can't, and I'm arguing in #31 and #60 that we should not innovatively introduce that concept either, and I think that's kinda what produces this asymmetry.

cpovirk commented 4 years ago

One other thing to mention (again) in this context is issue #62 which offers a way of sidestepping this problem by not considering unannotated type parameters as "extends @NullnessUnspecified", and instead consider them "extends @Nullable", IIUC (see the "unconstrained" item in the initial comment in issue #62).

This alludes to something that I think there's agreement on: when parameterizing an unannotated class, type arguments with any nullness should be allowed (maybe with warnings). That is, given an unannotated class Foo<T> {...}, both parameterizations Foo<@Nullable Bar> and Foo<@NotNull Bar> should to a first approximation be allowed.

Yeah, I think there is agreement on this. In that sense, unannotated type parameters are very much like extends @Nullable. However, given that we're likely to warn at instantiation time and/or at usage sites, I am skittish about saying that we outright "treat them like extends @Nullable" -- even though, from the perspective of a user in "lenient mode," that's almost exactly how it will look.

And, diving deeper, since I can't resist... :) Even in lenient mode, it's still only "almost" exactly. That's because, with BoxOrEmpty<T extends @NullnessUnspecified Object>, we're likely to permit (with warning):

BoxOrEmpty<@NotNull Object> box = ...;
box.setOrClear(null);

But with BoxOrEmpty<T extends @Nullable Object>, that's likely to be an outright error -- and wrongly so.

(And I say only "likely to be an outright error" here because that actually depends on yet another design question that I want to get to at the bottom of this post :))

I think in my mind I've been putting the soundness bar pretty (too?) high here: allow treating parameterizations of unannotated generic classes soundly regardless of what might go on in the unannotated generic class. It finally occurred to me today that that's more ambitious many sound modular checks.... I wanted to explicitly acknowledge it and recognize that it's a taller order than I had previously realized.

It's a good point that this is a high bar. This should be part of discussions @kevinb9n proposes about the use of "strict mode" in general.

(I still kind of want to just say "I'm pretty sure we can reach the bar, so let's do it!" We'll see if that optimism holds up :))

  • In the presumably unusual case that a type variable use T in Foo's declaration is annotated then the annotation wins.

[edit: I don't want it to straight-up unconditionally win. That's because:]

This is one more wrinkle that I want to write up: What I really want is the following:

After substitution, I want to consider the nullness of every usage derived from T to be jointly influenced by two things:

  1. by the nullness that T is instantiated with, and...
  2. by one of the following:
    • its explicit annotation (as in @Nullable T get()), or, if it has no explicit annotation, by...
    • the defaulting annotation in effect (which our current proposal does not consider)

Given those two inputs, the rule I want is again to take the "most nullable" of the two values.

I am about to leave for the weekend, so I'm not going to collect my thoughts well enough to say much more. But my overall goal is to push us even closer to the Kotlin model (I think?): In Kotlin, you can make a type "more nullable" by sticking a ? on it, but you can't make it "less nullable." This is what I'm aiming for, and that's why I want to omit the concept of "projecting to @NotNull," as permitted by your final bullet point, the one I quoted above. (And it leads into why I want to omit @NotNull entirely.)

The difference in substitution rules, summarized:

I am still working to express this in a clear way :\ Sorry for the roughness at the moment.

cpovirk commented 4 years ago

(It may have been ill-advised for me to post anything in a rush. If my post makes no sense, don't tear your hair out decoding it. I will do better next week. Thanks for your continuing patience and dedication to this.)

cpovirk commented 4 years ago

Here's an attempt to specify my proposal: To compute the effective nullability specifier for the usage of a type variable after substitution:

(Note: Assume that @NotNull and @DefaultNullable do not exist in this world, for reasons that may or may not become clear from the rules.)

We can look at this as two rounds of "pasting." Here's an example class and usage:

[@DefaultNullnessUnspecified]
class BoxOrEmpty<T> {
  T getOrNull();
}

BoxOrEmpty<@Nullable String> box = ...;
box.getOrNull(); // <== the usage

First, since there no explicit annotation on the return type of getOrNull, we "paste" the default:

@NullnessUnspecified T getOrNull();

Then, we "paste" the value of T:

@NullnessUnspecified @Nullable String getOrNull();

Then we pick the "more nullable" annotation:

@Nullable String getOrNull();

[edit: More formally, we'd follow my mental model for @NullnessUnspecified, which requires forking the code into one version with [@NotNull] @Nullable String getOrNull() and another version with @Nullable @Nullable String getOrNull(). Then we pick the "more nullable" value in each case. In a world without @NullnessUnspecified, that's as simple as asking "Did we see @Nullable or not?" (well, at least when setting aside parametric nullness). In both cases, the answer is "yes," so we always end up with @Nullable String getOrNull().]

The pros I see of this model include:

Plus some other pros that I've already noted, some of which don't require every last detail of this model:

[*] edit: I would have to think about this more, but: If we adopt this proposal (including omitting @NotNull and @DefaultNullable), then maybe our spec, users, and implementers could apply either of two equivalent models: They can use the model described above, or they can assume that both the defaulting annotation in effect and the explicit annotation on the usage site contribute to the effective nullability. This would be possible because there would be no explicit annotation that is "less nullable" than any defaulting annotation. I'm not sure if I like that model better, but I like the idea everything (maybe) works out no matter which model users choose.

cpovirk commented 4 years ago

strict mode

I got a sense from @kevin1e100 and @eaftan on Thursday that they didn't anticipate much use of strict mode. This fit with what I thought I remembered from the Jetbrains folks, who, presumably based on their experience with Kotlin, saw strict mode as more of an afterthought.

Of course, on the other side, we have the Checker Framework and Infer, some users of which want to see a hard guarantee from the effort they've put into annotating their codebases.

In any case, I am personally currently optimistic that we can specify rules to at least permit a sound checker based only on our annotations. (Of course that checker may be inconvenient to use without its additional checker-specific smarts. And of course we may make our own compromises when annotating APIs like the JDK and Guava. But I'd like for the core rules to produce warnings or errors for any unsound input.)

Foo<@NotNull Bar>

Part of the goal of my model 1 is to avoid warning merely for instantiating Foo<@NotNull Bar> (and often to avoid warning for using it, too). I grant that, as you say above, it may also be possible to avoid warnings by instead instantiating as Foo<@NullUnspec Bar> and using that, but:

non-nullable <: unspec <: nullable

I'm still confused. Even if we have...

class Foo<@Out T extends @NullUnspec Object>
class Bar<@In T super @NullUnspec bottomtype>

...then the subtyping rules still don't let us write Foo<@Nullable Object> or Foo<@NotNull Object>, right?

cpovirk commented 4 years ago

...then the subtyping rules still don't let us write Foo<@Nullable Object> or Foo<@NotNull Object>, right?

Oh, sorry, it sounds like you're presenting forbidding those as a feature. Do I have that right now? I had been figuring we'd want to permit them (and I've gotten the same impression from @kevin1e100 and @stephan-herrmann (#62)), and I now realize that I'm projecting that onto you.

Anyway, my reasoning:

(Plus, it fits my mental model :))

As you say, much of this is moot if we don't introduce lower bounds for nullness. But irrespective of that, I think that permitting Foo<@Nullable Object> is a good thing. And I increasingly think that non-nullable <: unspec <: nullable, while almost certainly technically true, encourages us to think in a simplified model that does more harm than good.

[*] edit: Originally I used Reference instead of Supplier here, but I changed it. That's because I would probably annotate Reference as Reference<T [extends @NotNull Object>, forbidding Reference<@Nullable Object>. (The class can still support all callers because its constructor would accept a @Nullable T, and its get method would return a @Nullable T.)

cpovirk commented 4 years ago

Dumping related notes, part 1:

In defense of "most nullable" (and shading into #4)

I see "most nullable" as the most natural model. When I think of a prototypical generic API, I think of some kind of container (though my reasoning doesn't apply only to containers). There are 2 ways for a method on that API to return null:

  1. The container can contain null as a value.
  2. A method returns null as a sentinel.

Based on this, I see nullability primarily as "additive": If we are unsure whether a container can contain null as a value, or if we're unsure whether a method returns null as a sentinel, then we're unsure whether the method can return null. Unless, of course, we know that one of those holds true, in which case we're sure that the method can return null.

I grant that it is possible for an API to implement something like the inverse of (2). That is, its method could replace a null element with some other sentinel value. To support this behavior, we'd need a concept of "projecting" a possibly @Nullable value back to @NotNull.

However, I think this is the far less common case:

Building on that second point: I would go so far to speculate that it would not even occur to the average Kotlin user to look for syntax for @NotNull/projecting. Now, Java is not Kotlin, so maybe "most nullable" isn't the right choice here. But to me, the most persuasive arguments against "most nullable" will be those that distinguish Java from Kotlin in some way:

I think that both of these are worth exploring further. The main thing I want is to avoid including @NotNull/projecting because of a general sense that "People will need it often enough," "Every other system includes it," or "It doesn't limit our options." I contend that Kotlin and the "most nullable" approach (with its advantages, discussed above) argue against those points.

cpovirk commented 4 years ago

addendum: Hopefully, my previous post at least strongly implied one reason that I don't like @NotNull: Users might think they can write @NotNull T to produce a type that can never contain null, but that's not how it would work under my proposal. And if we don't have @NotNull, then we shouldn't have @DefaultNullable, either, because users won't have a way to override that default to say "not null" when they need to.

Dumping related notes, part 2 (the last part? for now?):

I think that parametric nullability works out in the same way under my proposal as under the current design proposal. Example:

[@DefaultNullnessUnspecified]
class Foo<T> {
  T bar() {
    return null;
  }
}

My rules give us:

[@DefaultNotNull]
class Foo<T extends @Nullable Object> {
  @Nullable T bar(); // returns @Nullable
}
[@DefaultNotNull]
class Foo<T extends Object> {
  @Nullable T bar(); // returns @Nullable
}
[@DefaultNotNull]
class Foo<T extends @Nullable Object> {
  T bar(); // returns parametric nullness
}
[@DefaultNotNull]
class Foo<T extends Object> {
  T bar(); // returns not null
}

The return null implementation passes analysis in the first 2 versions but not the other 2, so we issue a warning. That's sounds like the right behavior, and it's consistent with the current design proposal.

A slightly more interesting case is for a parameter of type T:

[@DefaultNullnessUnspecified]
class Foo<T> {
  void bar(T t) {
    t.toString();
  }
}

The rules play out similarly to in the first example. The interesting part is that only one of the 4 versions passes analysis -- specifically, the one that chooses "not null" for both type usages. Still, that's enough to pass analysis with a warning. And again, the warning sounds like the right behavior, and it's consistent with the current design proposal.

kevin1e100 commented 4 years ago

@cpovirk would you be open to splitting the discussion of how to derive nullness of type variable usages from this issue (which ostensibly was about how to think about UnannotatedFoo<@Nullable Bar>? If so, could you please file that issue? I do want to discuss the merits "most nullable" idea vs "explicit annotation always wins", but I've been hesitant to do that here. Either way is fine, just let me know

cpovirk commented 4 years ago

@kevin1e100 Good call. I started to do that in #75, but then I discovered that we already have #13. I will move some things over there (both from here and from the new issue...). Feel free to get in ahead of me if you've like.

Should I also create an issue, "Are type usages influenced by the defaulting annotation in effect?" Or is that better handled here (or elsewhere, or better postponed for later)?

cpovirk commented 4 years ago

Argh; I keep second-guessing myself about what belongs on this issue vs. somewhere else.

Earlier in the thread, kevinb9n summarized the questions at stake:

The premise here is class Foo<T extends @NullUnspec SomeType> (irrespective of whether explicit, via defaulting or default-defaulting), and to close this issue we will need to decide:

  1. What is the treatment (terminology, behavior, etc.) of type usages of the type variable T itself inside that scope (annotated or not)?
  2. What type arguments will be legal/illegal/"warning"?
  3. Given a type argument, how does substitution operate on type usages of T?

All those same questions could also be asked about class Foo<T extends @NotNull SomeType> and class Foo<T extends @Nullable SomeType>:

1.

If we want this issue to be only about @NullnessUnspecified bounds (and maybe @NotNull bounds, but I don't think there's controversy there), then maybe we should:

But as always, everything is intertwined, especially the various part of our search for an understanding of @NullnessUnspecified.

Do you (@kevin1e100) have a vision for how to split this up and what to discuss where? Moving most discussion to #13 could help separate concerns, or it could make the discussion harder to follow by splitting it across threads. Or maybe it's our chance to run away from this thread in favor of a new, better-structured one :)

I can adapt to whatever split. Or I can arbitrarily make one up, but I want to make sure the split is actually addressing your needs, and I've been floundering at figuring out the best way to do that :(

kevin1e100 commented 4 years ago

So I keep thinking that there are several different goals at play here, but I'm having trouble teasing them apart, so maybe I'm just wrong to think there are several issues:

  1. one issue seems to be the question of whether, for interpreting a not-explicitly-annotated type variable use, the surrounding default should matter. Best I can tell there's ergonomics considerations here around partially annotated files, how best to facilitate successively annotating more and more of a class, and consistency between interpreting type variable usages and regular type usages
  2. the desire to make it so parameterizing an unannotated class with @Nullable "makes" the substituted type variables @Nullable, but parameterizing with @NotNull results in unspecified.
  3. the desire to make nullness "additive", ie., that @Nullable can only ever be added to a given type and never removed from that type.
  4. whether explicit annotations on type usages can ever be ignored

I believe these could maybe be handled separately. I'm separating out the 4. point b/c it seems like we can always give explicit annotations precedence if we so choose. I feel that the 2. point has to do with this issue and "more nullable" has bearing on it, while the first one belongs in #13 and isn't strictly related to "more nullable", maybe? But "more nullable" has broader implications than the 2. point, namely into the 3. point, and possibly also into the first point. I guess now that I've written that I feel better discussing "more nullable" here, but I still wonder if a whole-new issue would be better.

Why do I say I see the first issue as separate? It seems that "more nullable" says to consider type argument nullness in addition to the annotations on the generic class, but that seems separate from how we interpret the annotations on the generic class. That is, even without considering defaults for type variables, "more nullable" seems to make some sense:

@DefaultNullable
class Foo<T extends @NullnessUnspecified Object> {
  T bar();
}

The first issue is about whether @DefaultNullable should apply to bar's return type. But let's say it doesn't, meaning we'll ignore defaults when reasoning about occurrences of T, including after parameterization. Then "more nullable" is still useful to determine that bar's return type is @Nullable for Foo<@Nullable String> but @NullnessUnspecified for Foo<@NotNull String>, or am I missing something?

cpovirk commented 4 years ago

"Most nullness" comes up in your example only if we apply something like model 3. (Links to models again: effects, rules.) In contrast, models 1 and 2 don't take the bound into consideration during substitution, only during the original instantiation. That's all fine: We can certainly discuss "most nullable" in the context of model 3. But I think we're stuck discussing it in some context.

Here's a proposal: I can write up a bunch of code snippets (like BoxOrEmpty above), and we can discuss, in an ideal world, what we'd like a tool to do for each. By focusing directly on behavior in specific cases, we can largely (though not completely) avoid worrying about the interactions between general rules.

At the same time, of course, we can keep one eye on top-down rules like:

kevin1e100 commented 4 years ago

I definitely support writing a comprehensive doc, instead of trying to fit the various implications of your "additive" model into various issues. Again, maybe I'm over-thinking this, but yes, what I was trying to say was that there seems to be a "model 3a" where model 3 is combined with "more nullable" (and you could further refine that to give explicit type use annotations precedence or not) while still not using defaults (as in model 1), just type parameter bounds.

Another way of saying that is that "model 1" seems to differ from "model 3" in three seemingly distinct ways:

It seems to me that you can vary these dimensions separately. Such as, there also seems to be a model 1a where explicit annotations are still honored over everything else, for instance. Disclaimer: a lot of combinations may well not work, I haven't fully thought through all of them.

cpovirk commented 4 years ago

Thanks. I agree that 3a is a model we can consider, and I agree that there may be many other possible models. Hopefully we'll eventually narrow it back down to 3 or so "serious" options. And hopefully that will get easier when I get together a list of examples.

I was going to say that model 1 doesn't consider bounds at all, only defaults and the type argument. But I have to put a couple qualifications on that. I do still think it's the right starting point for understanding the model, but the qualifications are important, too:

(1) a type argument of unspecified nullness

@DefaultNotNull
interface Foo<T [extends @NotNull Object]> {}

[@DefaultNullnessUnspecified]
class Bar {
  Foo<T> foo;
}

Here, we would probably consider foo to be a Foo<[@NotNull] T> rather than a Foo<@NullnessUnspecified T>. But I see that as a result of yet another rule, which is either:

(2) the wildcard case

class Bar {
  Foo<? extends @Nullable Object> foo;
}

Here, foo has an effective bound of @NotNull on the type parameter, even though the wildcard bound is more general, because this is how wildcards work in general: an EnumSet<? extends Object> is "really" an EnumSet<? extends Enum<?>>.

cpovirk commented 4 years ago

Changing the subject somewhat: Another spin on my mental model:

A warning from a checker is a way of saying "When this class is annotated for nullness someday, this operation might become an error (and so might some later, related operations)." (Of course, we hope that it will usually become a non-warning, non-error :))

This fits with the idea that:

kevin1e100 commented 4 years ago

I think I got myself confused above when claiming that "model 1" considers bounds.  I was thinking of this case:

[@DefaultNullnessUnspecified]  // or @DefaultNullable
class Foo<T> {  
  @DefaultNotNull
  T bar(T baz) {...}

I guess the intention here is actually that 

Do I have that (now) right?  This particular effect of placing @DefaultNotNull on a method clearly has me pretty confused for some reason but T's bound is clearly ignored (where I had confused myself into believing it wasn't). Anyway that's just to clarify, I don't think it changes any conclusions from recent comments.

cpovirk commented 4 years ago

I think that matches what I'm thinking, yes.

cpovirk commented 4 years ago

One more general point about "most nullable" before I follow through on my promise of a list of code snippets to consider individually:

@kevinb9n has raised the concern that we might actually want "most conservative" / "pessimistic," similar to how we treat parametric nullability. A few thoughts:

First: To some degree, my "more nullable" rule already gives us this: When we instantiate an unannotated class with @NotNull Foo, my rule turns type-variable usages into @NullnessUnspecified Foo. Given that, we can produce warnings for any operation that might be unsafe. We could be extra conservative by requiring @NotNull in places, but this turns "possibly unsafe" cases into outright errors instead of just warnings. (For example, it prevents the implementation of MyMap.get from returning null as required.) I don't think anyone is advocating for that specific outcome. So let's figure that we'd still want @NullnessUnspecified in that case (even though that's arguably not "most conservative") and say that "most conservative" affects only the other half of the picture.

The other half of the picture is instantiating an unannotated class with @Nullable. Let's look at it first under my "most nullable" rule. That rule implies that:

We could instead use the "more conservative" @NullnessUnspecified in those cases. This gives us something kind of like model 3.

Why it's it exactly model 3? Model 3 applies @NullnessUnspecified even more widely: It lets calls to methods assume they receive @NullnessUnspecified return values, and it lets implementations of methods assume they receive @NullnessUnspecified parameters. The "most conservative" model would make both of those @Nullable.

Except "most conservative" is even more complex than that: When a method accepts a List<T>, should T be @NullnessUnspecified or @Nullable? It's not clear if it's an "input" or an "output," so we don't know which nullness is "more conservative."

More abstractly, I see "most conservative" as an uncomfortable middle ground. I see the other two solutions as more principled:

The middle ground of "most conservative" ends up saying: "If you see T instantiated with a @Nullable type, then sometimes you assume that it might be overridden to a non-null type, and other times you assume that it can't be overridden to a non-null type." It's kind of "meta-conservative": It's trying to pick a rule that would be most conservative no matter what other rules we set :)

So yes, I am yet again turning this into a point about projecting to @NotNull. I think that our position on @NotNull is tied to our position on @NullnessUnspecified:

(Or we replace every usage of T with exactly the type it was instantiated with. We know that this might be unsafe, so we warn you the moment you instantiate such a type, after which you're on your own. (Or, if you instantiate it with @NullnessUnspecified, we're back to assuming nothing about the type-variable usages.) This is model 2.)

I continue to like model 1 ("most nullable"), including disallowing projecting to @NotNull. This model is based on a practical observation:

I see further evidence of this in the success of Kotlin's type system, which permits projecting to @Nullable but not @NotNull.

stephan-herrmann commented 4 years ago

Unable to read through all that is said here, let me at least comment on this:

(There is also model 3.5, the Eclipse model, which I believe is somewhat similar to 3 but differs in a few respects: It applies only when instantating with @NotNull and only to return types that are naked type variables. (This is "Dealing with legacy" in GenericsEclipsePOV.) In other respects, it's like model 2, and maybe you can squint to see a little of model 1 in that it treats @NotNull and @Nullable differently.)

T extends @NullnessUnspecified hardly occurs in the Eclipse approach:

In all cases we let @Foo prevail, but under the mentioned heuristics give a warning that the reasoning leading to this may be weak.

From a quick scan, it seems that here we only discuss arguments / return types of the plain shape @Foo T, not anything like List<@Foo T>, right?

cpovirk commented 4 years ago

My expectation would be that the rules that have an effect on @Foo T would also have an effect on List<@Foo T>. But it's possible that the rules would be explicitly different -- possibly because we want a "most conservative"/"pessimistic" approach or because we decide the fully general rules would have to be too complex relative to the value they provide.

So:

class Lib<T> {
  T one();
  List<T> many();
}

...
Lib<@NotNull String> lib;

My thinking is that both usages of T end up with the same effecitve nullability specifier -- and the same is true if Lib uses something like @Nullable T instead of plain T. But again, I don't think much of anything has been settled.

(Hopefully that's what you were asking?)

stephan-herrmann commented 4 years ago

For top-level types we know the direction of data flow, so it's easier to see where warning are mandated. OTOH passing a List<T> creates a situation of shared access to the list instance, where either side may break assumptions on the other side, if we're not careful.

cpovirk commented 4 years ago

Here's a doc that covers 4×4 + 4×4 cases [edit: and growing...]: https://docs.google.com/document/d/1Ut6Gu8--UZsZInCko0zJRy7Kz3mqPT2AT7kcApM4RoE/edit As I note at the bottom, this is still far from comprehensive, but it's a start.

I'm now thinking about how hard it would be to prototype a mini-checker that operates only on these code snippets, offers configurable rules, and lets you diff the results across rule sets and against the ideal outcome.

cpovirk commented 4 years ago

An amendment to something I said before:

I recently said that I was down on the idea of different rules for what @kevinb9n is considering calling "in-types" and "out-types".

I still stand by that if I am more precise about it: The thing I'm down on is what we might call different "substitution" (and maybe "defaulting") rules for in-types vs. out-types.

Very broadly speaking, though, I do expect for us to sometimes see "different behavior" for in-types and out-types. This behavior would arise naturally when, for example, we handle parametric nullability: Given a type parameter <T extends @Nullable Object>, we'd forbid dereferencing a parameter of type T (it might be null!), but we'd also forbid returning null from a method that returns T (it might not be null!). This is in some sense asymmetric: "Is T @Nullable or not?"

But it's still symmetric in terms of substitution rules. That is, we'd still consider the type to be T in both cases: It's just that we're checking conversion rules in "the opposite direction": @Nullable isn't convertible to T, but neither is T convertible to @NotNull.

(Similar logic may apply for unspecified nullness, but I haven't thought it through deeply.)

So even though we see "different behavior" for in-types and out-types, it's not because we have different substitution rules for those types: It's just that, thanks to normal type-checking (and sometimes PECS), we sometimes check whether T is convertible to a given nullness, and we sometimes check the reverse. And for "in-between" types like parametric nullness, the two don't have the completely opposite behavior that we'd expect from a simpler system with only @NotNull and @Nullable.

I think this is probably all reasonably well understood already, but my earlier comments may have been misleading. Plus, I'm still not 100% sure that I have expressed it right :)

kevin1e100 commented 4 years ago

Two follow-ups to my previous comment that was trying to tease apart the concerns seemingly at work in "more nullable" concerns @cpovirk has raised on this thread (and elsewhere):

  1. since we're not pursuing default annotations on methods for the time being (#43), partially annotated files don't seem to be as important ATM: either a class is defaulted/opted-in, or not, IIUC.

  2. experimenting with Kotlin a bit, Kotlin seems to always honor JSR305's @Nonnull, even when placed on a type variable usage (e.g., @Nonnull T foo() { return ... }). I'm not aware of any existing tool that would ignore @No[nt][Nn]ull anywhere, but I haven't explicitly tried other tools in preparation for this post. I still grant that there are (almost) no valid use cases for annotating type variable usages this way, but I do also really struggle to see the point of ignoring explicit @Nonnull annotations in this case, not least since that seems to be at odds with how existing annotations are treated: since, e.g., it would be a breaking change for Kotlin to start ignoring JSR @Nonnull, it seems extremely confusing to ask it to ignore another spelling of that annotation.