Closed odersky closed 3 years ago
Isn't erasedValue
a kind of weird fit for this in terms of looking it up? If I look up erasedValue
currently, I get lot's of stuff about inline, match, and macros. As a user then, if I saw that when I was looking up this, I might think I took a wrong turn somewhere.
As long as we're not targeting 3.0 with this change ... because this is not triggered by any feedback coming from the community, and we've already said that further language changes would only be done in response to community feedback. (Or blocking issues, which this is not since it's already an obscure use case, and as is, it works.)
the init checker should check that such variables are not referenced before being assigned to
I think it's generally not achievable without complicating things due to aliasing --- it's the same as enforcing a typestate protocol in a language.
On the other hand, if the object can be aliased, the usage of Option[T]
might be safer for performance-insensitive applications:
- var x: T = _
+ var x: Option[T] = None
@Katrix The documentation of erasedValue
would have to be changed to cover the new use case.
On the other hand, if the object can be aliased, the usage of Option[T] might be safer for performance-insensitive applications:
Yes, but the use case = _
is for performance sensitive applications. I believe it is OK to demand verification by hand that the variable is assigned before use.
Another choice would be to always use null
as the initializer. I.e.
new AbstractIterator[A] {
private[this] var hd: A | Null = null
But then the danger is that we will make use of the initial value since we went through the trouble of defining it:
if hd == null then // initialize
else // use hd, with flow typing we know it is not null
Very tempting but wrong, since null
might be a legal value of type A
. So, I believe _/erasedValue
is better since they do not tempt one to test for null
.
It seems to be at least in Dotty, var x: T = _
are all used in places where we know the concrete type of T
. Therefore, such usage can be eliminated with either T | Null
or a special sentinel value if T = Int
.
In performance-sensitive applications, users can always use a cast to avoid runtime checks.
The example of AbstractIterator
I gave comes straight out of the standard library. There are several others like that.
In performance-sensitive applications, users can always use a cast to avoid runtime checks.
I am not sure I agree. The point is, we want to communicate "this variable is defined but not initialized". That's a common idiom in all imperative languages. Making up a non-sensical initializer and then casting obscures the logic.
"
I am not sure I agree. The point is, we want to communicate "this variable is defined but not initialized". That's a common idiom in all imperative languages. Making up a non-sensical initializer and then casting obscures the logic.
We already have to do that for local var
s. We can do it for fields as well. There's nothing special about fields vs local vars in this. In fact I'm pretty sure I had to invent a non-sensical initializer for local var
s more often than I have needed a = _
for fields.
Maybe compiletime.noValue
instead of compiletime.erasedValue
? That would avoid the confusion.
Unless we can have a check that a variable is always initialized before accessing it (which seems hard to implement), leaving a variable uninitialized is a fundamentally unsafe way of programming, where the programmer basically tells the compiler "trust me on this".
So an unsafe cast seems warranted for this.
var hd: A = null.asInstanceOf[A]
Such a cast is similar in nature to the ones we need in the low-level implementations of things like ArrayBuffer
(which needs casts because it's based on an underlying Array[AnyRef]
).
Having a special feature like compiletime.noValue
would make it look like using uninitialized variables is fine and dandy, legitimate even in non-performance-sensitive code, whereas it can actually lead to type soundness issues!
You can't write . We really need something that can be used for dropping initialization and nowhere else. So, how aboutnull.asInstanceOf[A]
anymore, thankfully
var x: T = notInitialized
That's pretty clear, no? And it would get rid of the obscure use of _
.
One can argue that notInitialized
is unsound and therefore should not be used. I will not be the one to lead the battle to get rid of it in all existing code bases, but if someone else goes ahead, I am sympathetic.
I know I don't bring much value with this, but still: As far as know (not only) in the Scala community, unsafe things are usually named unsafe*
, so how about unsafeNotInitialized
or unsafeUninitialized
?
Maybe we don't need to get rid of this feature altogether, but naming it like this would clearly demarcate that the user is playing with fire here :fire:
unsafeNotInitialized
would work as well, but i think notInitialized
is probably sufficiently clear.
We already have to do that for local vars. We can do it for fields as well. There's nothing special about fields vs local vars in this.
In fact, there is! We might not have a value for a field like here
class Memo[A](x: => A):
private var cached: A = notInitialized
private var known: Boolean = false
def force =
if !known then
known = true
cached = x
cached
But in a local context if we always have a value with which we could initialize a variable. Otherwise, why define the variable at all? Exception: A local context that exports an instance that accesses the variable in a closure. But then we could put the variable in the class of the returned instance itself.
But in a local context if we always have a value with which we could initialize a variable. Otherwise, why define the variable at all?
def lastSuchThat[A](xs: List[A])(p: A => Boolean): A = {
var found: Boolean = false
var result: A = null.asInstanceOf[A] // no initial value here
var rest = xs
while (!rest.isEmpty) {
if (p(rest.head)) {
found = true
result = rest.head
}
rest = rest.tail
}
if (!found)
throw new NoSuchElementException
result
}
Of course we could use an Option[A]
, but only if we don't have the performance-sensitive argument. And if we don't have the performance-sensitive argument, then we also don't have it for fields.
If the feature var x: T = _
is only to support performance-sensitive libraries where T
is generic, then can we have an ad-hoc solution for the ad-hoc usage?
class Memo[A](x: => A):
@embed
private var cached: UnsafeOpt[A] = UnsafeOpt.Empty
def force: A =
if cached.isEmpty then
cached.set(x)
cached.get
The annotation @embed
will inline implementation of Box
to the following after typer:
class Memo[A](x: => A):
private var cached$value: A = _
private var cached$init: Boolean = false
def force =
if !cached$init then
cached$init = true
cached$value = x
cached$value
Of course, the implementation of @embed
has to check that cached
does not leak, i.e. it's effectively private[this]
and never used directly as arguments to methods or assignments.
I believe that an erased notInitialized
value will be a lot simpler than an added annotation. PR to prove it is forthcoming.
@sjrd Yes, that makes sense. In fact a notInitialized
could also make sense for local values. It would require flow typing to check, so that's an additional expense. But since we already have flow typing for nulls, it could be considered.
Apologies for this pedantic comment, but this issue is currently titled: Get rid of _ =
in variable definitions
Shouldn't this read: Get rid of = _
in variable definitions instead?
Indeed. Fixed.
So I still need to read through this thread and catch up, but I want to underscore that the current meaning of this construct, which results in constructors which have no assignment bytecode for those slots, is crucial for performance sensitive areas. Simply taking null and casting it does produce the same result, but at the cost of an added instruction. This mistakes the purpose of the construct: it isn't to obtain a default value, it is to obtain a variable which has an undefined value for all intents and purposes and which is half as costly.
@djspiewak I guess the cast is removed by the erasure phase. Do you have evidence suggesting that initializing a variable with the default value for that type is slower than leaving it uninitialized? I would imagine the JVM compiles these down to the same machine code.
IIRC in Java, leaving uninitialized and initializing with (what Java specifies as) the default value are synonyms.
The cast is removed but not the assignment. This is particularly relevant if the assignment imposes a memory fence, but even without that, the cost is measurable if you're doing enough unavoidable allocations in a critical section.
So I still need to read through this thread and catch up, but I want to underscore that the current meaning of this construct, which results in constructors [..]
[..] but I want to underscore that the current meaning of this construct [..]
underscore
Why not have something called nullOrZero or something like that, which is a utility that returns null or a type's empty value. It doesn't need to be special for initialization but it should be named so that people don't use it to introduce null without realizing it.
How variables should interact with null tracking should be its own discussion. Even with syntax you're introducing the danger of an NPE. And why shouldn't var x: String = null be allowed if = is? So I think it's a separate problem. Either you force variables not initialized to a non null to be | Null for safety, or you say that var fields are special.
On Thu, Jan 28, 2021, 11:40 AM Facsimiler notifications@github.com wrote:
Apologies for this pedantic comment, but this issue is currently titled: Get rid of _ = in variable definitions
Shouldn't this read: Get rid of = _ in variable definitions instead?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-769213647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUEPUQN42AIHWHCUPLTS4GHQFANCNFSM4WU6TKJQ .
Either you force variables not initialized to a non null to be | Null for safety, or you say that var fields are special.
I mean, var fields are already special.
To be clear, I don't have any problem going through some extra type system pedantry to allow for uninitialized fields, though | Null
isn't quite sufficient since it doesn't cover primitives accurately. My fear if we allow notInitialized
in other general contexts is that it will be abused quite heavily. null.asInstanceof[A]
is already there, yes, but it looks radioactive so people generally don't touch it. When I've used it in code shared with less experienced teams, for example, I'm invariably asked to write a book in comments explaining what's going on. I doubt that notInitialized
will provoke the same aversion, and I don't think that's a good thing.
Honestly the ideal syntax here really would be just leaving off the assignment, but of course that conflicts with abstract
member denotation. This is part of why I actually like the = _
syntax. Underscore is already a heavily overused token within the language, and people seem to understand that it has different meanings in different contexts. Honestly, even removing this particular construct isn't going to do much to move the needle on that. When people see something like var x: Int = _
, they usually understand that it's some syntactic weirdness thing and they either understand it or they make a mental note to come back and figure it out. I've never seen anyone get confused by the fact that this syntax doesn't work in any scope other than the top level of a class/trait/object (though I admit to having tried it in local scopes several times early on in my Scala explorations before I realized that Scala lacks flow typing).
I guess my general conclusion is that this is not really that big of a deal. Changing it to notInitialized
(or anything else) isn't really going to materially demystify _
, particularly when the construct is so rarely used anyway, and introducing a seemingly-orthogonal mechanism for this functionality (notInitialized
or similar) runs the risk of introducing a whole added source of confusion due to the fact that the expression just doesn't make sense in a ton of contexts, but the compiler will either let you use it (amplifying confusion and injecting unsoundness into the language) or disallow it (amplifying other confusion and removing any orthogonality).
Particularly in light of the lateness of the hour -- and it is very late in the game to be having these kinds of conversations! -- my preference would be to just leave it as it is. Changes at this point which remove syntax should really only be considered under extreme circumstances, because otherwise we're never going to be able to, as an ecosystem, land this plane. If status quo is considered an unacceptable outcome, then my second choice would be to lean into the magicness of null.asInstanceOf[A]
and just make that the new = _
replacer. Thus, instead of introducing a magical notInitialized
, we just force people to use null.asInstanceOf[A]
as the assigned expression and give the compiler magic knowledge of this such that it can remove the assignment altogether. This has the advantage of being generally source compatible with Scala 2, though it will result in a performance regression for any code which is cross-compiled back to 2.x, which likely means that we would want to push for additional 2.12.n and 2.13.m versions which backport the magical awareness.
Hopefully it's clear why my preference is to leave it alone. Even a small change here (making null.asInstanceOf[A]
just a bit more magical) has rippling consequences which would be very painful at this stage.
Note: uninitialized
is defined erased
, so it cannot be misused as a regular value.
Erasing is an optimization, that doesn't mean it can't use a more general method.
I think C# has such a general method.
At long as it has null in its name people won't use it where they wouldn't use null, but they should be able to when they want to.
On Mon, Feb 1, 2021, 9:44 AM odersky notifications@github.com wrote:
Note: uninitialized is defined erased, so it cannot be misused as a regular value.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-770908316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUFLCWNHCN7KUA5L3GDS42443ANCNFSM4WU6TKJQ .
Note: uninitialized is defined erased, so it cannot be misused as a regular value.
What I meant by this is that there's a check that uninitialized
cannot be used as a regular value.
The only place it is legal in normal code is as the initializer of a mutable field.
(theoretically you could use it also in typelevel code, as long as no actual bytecode gets generated from that code)
So, to compare:
We go from a truly obscure use of _
that's impossible to google to a reference
uninitialized
that has to be imported from scala.compiletime
and that
has a doc comment explaining what it is.
Looks like a clear win to me.
The other reasonable alternative I see is to eliminate = _
or = uninitialized
altogether.
and recommend that it's substituted by
var x: T = null.asInstanceOf[T]
I find that worse in many dimensions
asInstanceOf
Possible generalization for later:
Allow uninitialized
also as the initializer for local variables. That would require a flow-analysis
similar to Java to ensure that such variables are assigned before use.
On Mon, Feb 1, 2021, 12:53 PM odersky notifications@github.com wrote:
Note: uninitialized is defined erased, so it cannot be misused as a regular value.
What I meant by this is that there's a check that uninitialized cannot be used as a regular value. The only place it is legal in normal code is as the initializer of a mutable field.
(theoretically you could use it also in typelevel code, as long as no actual bytecode gets generated from that code)
So, to compare:
We go from a truly obscure use of _ that's impossible to google to a value uninitialized that has to be imported from scala.compiletime and that has a doc comment explaining what it is.
Looks like a clear win to me.
The other reasonable alternative I see is to eliminate = _ or = uninitialized altogether. and recommend that it's substituted by
var x: T = null.asInstanceOf[T]
I find that worse in many dimensions
- it abuses null
- it abuses asInstanceOf
- it encourages to use the same idiom elsewhere for normal values.
I already suggested something that's better than both. Call it nullOrDefault[A] or something like that, make it legal anywhere but treat it specially in var field initializers.
Actually I would only recommend using it for initializers when the type is generic. If it's known to be a reference type you should write null, if it's numeric 0, and fit Boolean false and for Unit (). That would make code the least surprising. The fact that in reality doing an assignment instruction can be elided is an optimization.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-771038466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUH3COTSKCP4PK3WK3TS43TDNANCNFSM4WU6TKJQ .
and introducing a seemingly-orthogonal mechanism for this functionality (notInitialized or similar) runs the risk of introducing a whole added source of confusion due to the fact that the expression just doesn't make sense in a ton of contexts, but the compiler will either let you use it (amplifying confusion and injecting unsoundness into the language) or disallow it (amplifying other confusion and removing any orthogonality).
Actually, no. The idea is that the uninitialized
value is declared as erased
(@compileTimeOnly
in Scala-2). So it's not a special case by itself. No confusion is amplified and no orthogonality is lost. The only special action taken by the compiler is to remove
= uninitialized
assignments from the code, and that is exactly the same as it was for _ =
.
I already suggested something that's better than both. Call it nullOrDefault[A] or something like that, make it legal anywhere but treat it specially in var field initializers.
The problem with this is that in a null-safe world, nullOrDefault[A]
should have type A | Nulŀ
, but then it cannot be assigned to a field of type A
, by contrast uninitialized
would let you type it as A
even if Null
is not a subtype of A
. This is fine (or at least not worse than the status-quo) since we cannot avoid having to deal with uninitialized fields on the JVM, and we already have an initialization checker (cf https://dotty.epfl.ch/blog/2020/03/18/23rd-dotty-milestone-release.html).
[This is essentially a duplicate of @smarter's comment, so you can just skip it.]
I considered nullOrDefault
but discarded it because it would have the wrong type. It would have to be defined like this:
def nullOrDefault[T]: T | Null
That's the wrong type for primitive instances of T
. Int | Null
erases to AnyRef
, not Int
.
It's also of dubious value. Normal types don't have default values in any useful sense. So the
danger is that this would be used in areas where it is not appropriate.
Like I said, that's a more general problem. Is it legal to initialize it to null explicitly? If yes how do you avoid the null type issue? If no why not, it will be null anyway, it's silly to not be allowed to say so.
Regarding making it more general, the point is to be useful more generally. Once we are defining a method it could be useful elsewhere, however rarely. It's true that null.asInstanceOf is not a good way to provide that but something should.
If you're not defining a method that's usable anywhere else, and you're not working within the type system, then special syntax is better. Maybe underscore is not the best but something that looks like special syntax conveys that it can only be used in certain places. Something that looks like a method but can only be used in certain places or has a name that only makes sense in such places doesn't seem like good design to me.
On Mon, Feb 1, 2021, 1:20 PM odersky notifications@github.com wrote:
I considered nullOrDefault but discarded it because it would have the wrong type. It would have to be defined like this:
def nullOrDefault[T]: T | Null
That's the wrong type for primitive instances of T. Int | Null erases to AnyRef, not Int. It's also of dubious value. Normal types don't have default values in any useful sense. So the danger is that this would be used in areas where it is not appropriate.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-771057674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUB7XFHD5OC46UNTTADS43WE7ANCNFSM4WU6TKJQ .
Actually, no. The idea is that the
uninitialized
value is declared aserased
(@compileTimeOnly
in Scala-2). So it's not a special case by itself. No confusion is amplified and no orthogonality is lost. The only special action taken by the compiler is to remove= uninitialized
assignments from the code, and that is exactly the same as it was for_ =
.
It's not like any other erased
stuff, since it remains the only valid erased
term that can on the rhs of a non-erased var
definition. So it is still a special-case, even considering erased
.
In addition, erased
didn't make it into 3.0 anyway, so for all intents and purposes, you still have to document and explain it as a special-case, since you can't explain it in terms of erased
which is not there.
Therefore, I don't think that the argument that "it's just an erased
so not a special-case" holds up.
@odersky
- It does not abuse
null
orasInstanceOf
I don't think null.asInstanceOf[A]
is an abuse of asInstanceOf
. Functionally, it's clearly similar to an unsafe cast: it provides a value that's not truly of the documented type, which constitutes a soundness hole – an escape hatch, just like asInstanceOf
.
When trying to assess the type safety of a program, one needs to determine where soundness holes are introduced. To do so, we already have to look out for uses of asInstanceOf
and @unchecked[Variance]
. Now uninitialized
would be yet another soundness hole to look out for.
I concede that null.asInstanceOf[A]
still can be seen an abuse of null
. We could instead make it Any.asInstanceOf[A]
, where val Any: Any = null
would be added to Predef
, and would be described in doc as an arbitrary value. This has the advantage of making the intent of the uninitialized-var use case clearer.
It's not like any other erased stuff, since it remains the only valid erased term that can on the rhs of a non-erased var definition. So it is still a special-case, even considering erased.
Yes, in the sense that it is removed when used as the initializer of a field. I mentioned that above.
In addition, erased didn't make it into 3.0 anyway, so for all intents and purposes, you still have to document and explain it as a special-case, since you can't explain it in terms of erased which is not there.
But @compileTimeOnly` exists in Scala-2 and is equivalent.
@compileTimeOnly
has 0 common point with erased. It just emits an error if any call to an @compileTimeOnly
method survives until rechecks. That also applies to calls that are themselves within an @compileTimeOnly
member. It's meant for fake methods that should be captured and replaced by a surrounding macro. For example, the .value
of sbt.
Is this legal or not (with null checking)?
class C { var field: String = null }
On Mon, Feb 1, 2021 at 4:43 PM Sébastien Doeraene notifications@github.com wrote:
@compileTimeOnly has 0 common point with erased. It just emits an error if any call to an @compileTimeOnly method survives until rechecks. That also applies to calls that are themselves within an @compileTimeOnly member. It's meant for fake methods that should be captured and replaced by a surrounding macro. For example, the .value of sbt.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-771179394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUBQ2SPE2PCL56YHQE3S44OBVANCNFSM4WU6TKJQ .
@nafg It's not legal.
@compileTimeOnly has 0 common point with erased. It just emits an error if any call to an @compileTimeOnly method survives until rechecks
And erased
just emits an error if any call to a erased
method survives until after erasure. I don't think that fine point matters here, as long as you say that uninitialized
field assignments are removed before the check.
Is there any motivating reason why = _
should be allowed and not = null
?
On Mon, Feb 1, 2021 at 5:25 PM odersky notifications@github.com wrote:
@nafg https://github.com/nafg It's not legal.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-771202628, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUFIJGAWQJZCP7HTQ6DS44S4FANCNFSM4WU6TKJQ .
Either that means changing the meaning of null
generally - making it inhabit Nothing
rather than Null
- or changing its meaning when on the RHS of a var
. The latter seems definitely wrong to me, while the former seems too risky of a change.
Also, if I remember correctly from the discussions around -Xcheck-init
, there's a difference drawn between = _
and = null
where the former means uninitialised and the latter means initialised to null (similarly for 0/0.0/false..) - relevant to the initialisation checking that flag is defined for.
When is that difference useful (in the presence of null aware types)?
I guess what I'm getting at is that for a var field, its initial value doesn't follow the same rules as its type claims about other reads and writes. That's a fact but is currently limited to _. But given that is the case I don't see why the exception to its type can't be made broader.
Maybe the answer is that initialization checking sort of plugs that hole in soundness, and so the hole shouldn't be widened more than it can be plugged. But that just begs the question. It sounds like this is really a teeny bit of flow typing, so couldn't it be generalized to check if it's become non-null? After all that the underlying purpose of initialization checking, isn't it?
On Tue, Feb 2, 2021, 5:26 AM Dale Wijnand notifications@github.com wrote:
Either that means changing the meaning of null generally - making it inhabit Nothing rather than Null - or changing its meaning when on the RHS of a var. The latter seems definitely wrong to me, while the former seems too risky of a change.
Also, if I remember correctly from the discussions around -Xcheck-init, there's a difference drawn between = _ and = null where the former means uninitialised and the latter means initialised to null (similarly for 0/0.0/false..) - relevant to the initialisation checking that flag is defined for.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lampepfl/dotty/issues/11225#issuecomment-771534815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYAUAEOT62XXSJ7FPK7OLS47HOFANCNFSM4WU6TKJQ .
its initial value doesn't follow the same rules as its type claims about other reads and writes. I don't see why the exception to its type can't be made broader
I don't follow these parts in particular, but I think I agree with you: I can't come up with how = _
is treated differently to -Xcheck-init
. Maybe I misremembered. But I still believe my points on not changing null
's type.
An obscure use of
_
occurs invar
definitions:It defines a concrete variable
x
without an initial value, or rather the default initial value that the JVM assigns to object fields. It can only be used in a class or object, not to initialize a local variable. It is essential in situations like this:Here we cannot initialize
hd
with a value since its type is parametric, and we do not know a value forA
. The logic inAbstractIterator
makes sure thathd
is read only after it is assigned by checkinghdDefined
before readinghd
.So the idiom is rare but essential where it is needed. But can we find a less obscure notation that does not add another overload to
_
?One possibility would be to define somewhere an abstraction like this:
defaultValue
can be expressed in terms of_
:It can also be a compiler-intrinsic, and then it could replace
= _
in variable definitions.But I am a bit reluctant to do that, for fear that then
defaultValue
will be also used in situations where it is not appropriate. The big problem withdefaultValue
is that it returnsnull
for all class types including boxed numeric types. That's very confusing! We do not want to give this a more prominent status than what we have. The situation would get worse with explicit nulls. In that case, we would have:But then
would be ill-typed. It would have to be:
This is safe, but it defeats the purpose. What should happen for
is that the programmer or the init-checker proves that
x
will be initialized with something else before it is first referenced. That's the intended meaning.So, maybe a better choice is to re-use
compiletime.erasedValue
for this?This means there is no initial value at runtime (the value is erased), so there is no initial assignment. That can be allowed for objects and classes since objects are bulk-initialized (but the init checker should check that such variables are not referenced before being assigned to). It could be allowed in the future also for local variables if we can prove that the variable is initialized by assignment before use. Everywhere else
erasedValue
is illegal, by the current rules.Admittedly,
erasedValue
is also obscure. HowevererasedValue
can be looked up, unlike_
erasedValue
does not add to the confusion relative to the other uses of_
.