scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.8k stars 1.04k forks source link

warn for `new Array[T]` when `T` is not nullable #15763

Open olhotak opened 2 years ago

olhotak commented 2 years ago

With explicit nulls, one source of unsoundness is array creation using new Array - the array is filled with nulls even if the element type is not nullable. This has been an observed cause of several nullness bugs in the compiler. The latest of these is #15687 fixed by #15748.

At one time we planned to deprecate the raw new Array and prefer library functions: Array.fill and Array.apply for arrays with a non-nullable element type that are guaranteed to be initialized and a new Array.ofNulls function to create nullable arrays, with signature Array.ofNulls[T](size: Int): Array[T|Null].

A simpler solution would be for the compiler to issue a warning for every new Array[T] when T is not a supertype of Null.

A question is whether this should be considered an explicit nulls warning or an initialization warning. That is, should it be issued only with both -Ysafe-init and -Yexplicit-nulls, or even without -Ysafe-init. On one hand, it doesn't really fit with the initialization checker implementation, and more with the checks done for explicit nulls (e.g. in RefChecks). On the other hand, from the point of view of users, there is more precedent for -Ysafe-init to issue warnings, while -Yexplicit-nulls usually changes types, possibly leading to compilation errors. It's debatable whether having an Array[String] full of nulls is a failure to initialize or an incorrect type.

/cc @noti0na1 @liufengyun

som-snytt commented 2 years ago

My new favorite thing for building an array is ArrayBuilder. If I happen to know the target size, then I can give it a hint. I like getting a result that is not a partially filled buffer. If I need Array[A|Null], I can ask for it, where initialization means whatever my result is.

liufengyun commented 2 years ago

A simpler solution would be for the compiler to issue a warning for every new Array[T] when T is not a supertype of Null. ... On the other hand, from the point of view of users, there is more precedent for -Ysafe-init to issue warnings, while -Yexplicit-nulls usually changes types, possibly leading to compilation errors.

For programmers, I think warnings (instead of errors) would have a better user experience, as they do not stop compilation and running of programs. For the proposal above, it seems it can be issued as a warning without any issue.

On the other hand, I expect such warnings will in most cases be false positives. It seems the real-world safety benefits brought by the check above will be limited unless the check can reason locally about the safe initialization of fixed-size arrays (for common idioms).

Meanwhile, the compiler is incapable of checking the validity of bounds of array access, and users can still get out-of-bounds exceptions. Given the usability and safety tradeoffs, it seems doing nothing for arrays is also an option here.

liufengyun commented 2 years ago

It seems the real-world safety benefits brought by the check above will be limited unless the check can reason locally about the safe initialization of fixed-size arrays (for common idioms).

Assuming a fixed-size array must be initialized locally inside the same method where it is created, I'm wondering whether we can use some analysis techniques (e.g., symbolic execution) to support common idioms? We can ask an existing transformer/typer to first mark methods that contain new Array(n), to avoid checking every method.

olhotak commented 2 years ago

I would expect it to be rare that an array is allocated using new Array(n) and then initialized locally within the same method. When the initial elements are known in the same method, usually it is possible to create the array using Array.apply or Array.fill. I would expect the common uses of new Array(n) to be:

  1. When the array elements are initialized sometime later using complicated logic, in different method(s).
  2. When the array elements are never intended to all be initialized, i.e., when the array is intentionally allocated larger than necessary to support adding more elements later, as in an ArrayBuffer.

For the second use case, the type of the array should be T|Null, since it's expected that some of the elements can be null. If we had a warning, the warning could suggest this type.