scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.81k stars 1.05k forks source link

Scala Wart: Weak eta-expansion #2570

Closed lihaoyi closed 7 years ago

lihaoyi commented 7 years ago

Opening this issue, as suggested by Martin, to provide a place to discuss the individual warts brought up in the blog post Warts of the Scala Programming Language and the possibility of mitigating/fixing them in Dotty (and perhaps later in Scala 2.x). These are based on Scala 2.x behavior, which I understand Dotty follows closely, apologies in advance if it has already been fixed


Scala maintains a distinction between "functions" and "methods": in general, methods are things you call on an object, whereas functions are objects themselves. However, since they're so similar ("things you can call"), it gives you a way to easily wrap a method in a function object called "eta expansion"

@ def repeat(s: String, i: Int) = s * i
defined function repeat

@ repeat("hello", 2)
res89: String = "hellohello"

@ val func = repeat _
func: (String, Int) => String = $sess.cmd90$$$Lambda$2796/1082786554@2a3983b9

@ func("hello", 3)
res91: String = "hellohellohello"

Above, we use the underscore _ to assign repeat _ to a value func, which is then a function object we can call. This can happen automatically, without the _, based on the "expected type" of the place the method is being used. For example, if we expect func to be a (String, Int) => String, we can assign repeat to it without the _:

@ val func: (String, Int) => String = repeat
func: (String, Int) => String = $sess.cmd92$$$Lambda$2803/669946146@46226d53

@ func("hello", 3)
res92: String = "hellohellohello"

Or by stubbing out the arguments with _ individually:

@ val func = repeat(_, _)
func: (String, Int) => String = $sess.cmd98$$$Lambda$2832/1025548997@358b1f86

This works, but has a bunch of annoying limitations. Firstly, even though you can fully convert the method repeat into a (String, Int) => String value using _, you cannot partially convert it:

@ val func = repeat("hello", _)
cmd4.sc:1: missing parameter type for expanded function 
((x$1: <error>) => repeat("hello", x$1))
val func = repeat("hello", _)
                           ^
Compilation Failed

Unless you know the the "expected type" of func, in which case you can partially convert it:

@ val func: Int => String = repeat("hello", _)
func: Int => String = $sess.cmd93$$$Lambda$2808/1138545802@2c229ed2

Or you provide the type to the partially-applied-function-argument _ manually:


@ repeat("hello", _: Int)
res4: Int => String = $sess.cmd4$$$Lambda$1988/1407003104@5eadc347

This is a bit strange to me. If I can easily convert the entire repeat method into a function without specifying any types, why can I not convert it into a function if I already know one of the arguments? After all, I have provided strictly more information in the repeat("hello", _) case than I have in the repeat(_, _) case, and yet somehow type inference got worse!

Furthermore, there's a more fundamental issue: if I know that repeat is a method that takes two arguments, why can't I just do this?

@ val func = repeat
cmd99.sc:1: missing argument list for method repeat in object cmd88
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `repeat _` or `repeat(_,_)` instead of `repeat`.
val func = repeat
           ^
Compilation Failed

After all, since the compiler already knows that repeat is a method, and that it doesn't have it's arguments provided, why not convert it for me? Why force me to go through the _ or (_, _) dance, or why ask me to provide an expected type for func if it already knows the type of repeat?

In other languages with first-class functions, like Python, this works fine:

>>> def repeat(s, i):
...     return s * i
...

>>> func = repeat

>>> func("hello", 3)
'hellohellohello'

The lack of automatic eta-expansion results in people writing weird code to work around it, such as this example from ScalaMock:

"drawLine" should "interact with Turtle" in {
  // Create mock Turtle object
  val mockedTurtle = mock[Turtle]

  // Set expectations
  (mockedTurtle.setPosition _).expects(10.0, 10.0)
  (mockedTurtle.forward _).expects(5.0)
  (mockedTurtle.getPosition _).expects().returning(15.0, 10.0)

  // Exercise System Under Test
  drawLine(mockedTurtle, (10.0, 10.0), (15.0, 10.0))
}

Here, the weird (foo _) dance is something that they have to do purely because of this restriction in eta-expansion.

While I'm sure there are good implementation-reasons why this doesn't work, I don't see any reason this shouldn't work from a language-semantics point of view. From a user's point of view, methods and functions are just "things you call", and Scala is generally successful and not asking you to think about the distinction between them.

However, in cases like this, I think there isn't a good reason the compiler shouldn't try a bit harder to figure out what I want before giving up and asking me to pepper _s or expected types all over the place. The compiler already has all the information it needs - after all, it works if you put an _ after the method - and it just needs to use that information when the _ isn't present.

senia-psm commented 7 years ago

Another use case: ask pattern for typed actors in akka. One can use it this way:

val result: Future[Result] = actor ? (Message("msg", _))
complete(result)

but not this way:

complete {
  actor ? Message("msg", _)
}

See this example

dragos commented 7 years ago

After all, since the compiler already knows that repeat is a method, and that it doesn't have it's arguments provided, why not convert it for me? Why force me to go through the or (, _) dance, or why ask me to provide an expected type for func if it already knows the type of repeat?

If I remember correctly it used to work like this but it was too error-prone when used on Java methods. In particular toString that people like to call without the (). So code like this did something unexpected (can you guess?):

"My foo: " + foo.toString

or even better:

if ("foo" == foo.toString) ...

In both cases eta-expansion wasn't what you wanted. Perhaps the second one can be ruled-out based on the expected type, or perhaps requiring parenthesis (which I totally agree it's a wart) would help, but people might still make the mistake. So should both these cases be an error? What error should it be?

lihaoyi commented 7 years ago

I personally don't think your examples are that unexpected; we pass around Function0[T]s all the time, and if you forget to call them, well, it behaves differently than if you did call them. People sometimes accidentally sending Akka actors companion-objects rather than instantiated messages. I've spent hours during refactoring tracking down places where I'm comparing (_: T) == (_: Some[T]). I even have Some(Hello World Blog) unintentionally as the title of one of my Scala-generated blog posts in Disqus!

screenshot 2017-05-30 14 12 45

These are all basically the same problem: that any2StringAdd, String#+(x: Any), .toString, ==, and others are type-unsafe. That could be solved by allowing some kind of SuperAny that doesn't have any methods, and making Any a subclass of that, though that will be a pretty large upheaval.

Nevertheless, I think it's strange to have a language feature propagate through every codebase in the world just because people make mistakes when calling a single method .toString. If we really wanted to, we could overload .toString via compiler magic so it has both .toString and .toString() versions. Or, as mentioned in https://github.com/lampepfl/dotty/issues/2571, we could perform that compiler magic for any Java methods (and Scala methods with a magic annotation), so it'll just become a Java interop quirk rather than a Scala wart. So even if we decide that people forgetting ()s during Java interop is a problem, there are solutions that could avoid that problem while removing all the weirdness from pure-Scala code

dragos commented 7 years ago

Nevertheless, I think it's strange to have a language feature propagate through every codebase in the world just because people make mistakes when calling a single method .toString

toString is just the simplest example, it's for sure not the only one! It's a thorny issue: silently accept erroneous code, or ask people to be explicit in some cases. Beginners will get this wrong and get burned at runtime, and the fact that Akka or other places in the language have similar gotchas still doesn't make it a clear win, I'd say.

Maybe the rule can be relaxed to require the _ only when the method takes no parameters?

Jasper-M commented 7 years ago

Whatever you do, with automatic eta-expansion you will always have the special case of methods without parameter lists (def foo = "foo") which cannot be automatically converted. Even for def foo() = "foo" automatic eta-expansion is deprecated apparently.

adriaanm commented 7 years ago

Maybe the rule can be relaxed to require the _ only when the method takes no parameters?

That's the plan set in motion by https://github.com/scala/scala/pull/5327

SethTisue commented 7 years ago

I'm leaning against this change.

It makes more things that more or less make sense, compile. Isn't that appealing? Sure, but it's a dangerous appeal. The more things compile, the more ways you can screw up:

as @lrytz pointed out in conversation just now, an example of the latter is a refactoring that adds arguments to a method that didn't have them before. now .foo still compiles, but is no longer a method call. if you're lucky the types won't line up and you'll get an error, if not, not.

admittedly none of this is conclusive, but I think we should be very cautious about adding yet more ways of writing the same thing to the language, and that's what this change does.

odersky commented 7 years ago

If we really wanted to, we could overload .toString via compiler magic so it has both .toString and .toString() versions.

toString is just used as an example. There are many more such methods. length is even worse than toString in that some of its usages are fields (i.e. in on arrays) and others are methods and then would require a (). I would guess that the majority of nullary methods in Java should be expressed without ().

odersky commented 7 years ago

After discussing this with @adriaanm, we are leaning towards the following compromise proposal for handling references to unapplied methods m:

  1. If m has one or more parameters, we always eta expand

  2. if m is nullary (i.e. has type ()R):

    1. If the expected type is of the form () => T, we eta expand.
    2. If m is defined by Java, or overrides a Java defined method, we insert ().
    3. Otherwise we issue an error of the form:

      Unapplied nullary methods are only converted to functions when a function type is expected. You need to either apply the method to (), or convert it to a function with () => m().

  3. The syntax m _ is deprecated.

smarter commented 7 years ago

Interesting! I love the idea of having one less meaning for _. What about implicit function types, do methods with implicit parameter lists ever get automatically converted to them?

lihaoyi commented 7 years ago

we are leaning towards the following compromise proposal for handling references to unapplied methods m

👍

I'd personally prefer to eta-expand () Scala methods too, but I'd be happy with an error message.

I suppose def foo = ... functions without arguments will now need to be called with () => foo as well, with foo _ being deprecated as well?

EDIT: I guess the whole eta-expand-nullary-methods thing was so controversial everybody forgot about the earlier case: what do people think about being able to expand

def repeat(s: String, i: Int) = s * i
val partial = repeat("hello", _)

without an expected type?

adriaanm commented 7 years ago

I suppose def foo = ... functions without arguments will now need to be called with () => foo as well, with foo _ being deprecated as well?

Yes, a reference foo to a definition def foo, would not be eta-expanded implicitly, and your only option would be to write the expansion out yourself: () => foo. For def foo(), you could write foo: (() => T) (or have the type ascription be implied by the surrounding code).

adriaanm commented 7 years ago

repeat("hello", _)

Martin and I have both been working on an implementation for this :-)

odersky commented 7 years ago

Martin and I have both been working on an implementation for this :-)

See #2691.

odersky commented 7 years ago

Interesting! I love the idea of having one less meaning for _. What about implicit function types, do methods with implicit parameter lists ever get automatically converted to them?

No. methods with implicit parameter lists always get applied to implicit arguments.

smarter commented 7 years ago

But what about def foo(x: A)(implicit bla: B): C, does it become A => B => C or A => implicit B => C ?

odersky commented 7 years ago

It becomes A => C, same as what would happen if you write foo _ now.

willisjtc commented 7 years ago

Why not just do as java and reference functions with the double colon operator or something similar and anytime you want the function to be executed you just give it the name. For example, object::length returns the function and object.length calls the property/function

lihaoyi commented 7 years ago

The linked diff appears to also enforce the correct-number-of-empty-parens; should https://github.com/lampepfl/dotty/issues/2571 be closed as well?

smarter commented 7 years ago

Yes indeed.