ruby / rbs

Type Signature for Ruby
Other
1.94k stars 208 forks source link

Suggestion/discussion about typing with the splat operator (*, **, "rest" or "varargs") #322

Open abravalheri opened 4 years ago

abravalheri commented 4 years ago

Hello. Lately I have been playing with the type system in Ruby and I notice some patterns that cannot be represented by the RBS syntax... So I wanted to start a discussion about it. I hope we can get some fruitful insights.

(OBS.: The discussion here is based on the example about splatting in the syntax document and one of my doubts in the issue in soutaro/steep#160.)

Motivation

In the syntax doc, it seems that the type of rest arguments can only be uniform (i.e. the same type for all the entries)... So in the following example it seems that the current grammar does not allow writing <something> in such a way that the arguments given to lazy are type matched against the arguments in the signature or @callable (please correct me if I am wrong)

class Effect
  def initialize(callable)
    @callable = callable
  end

  def lazy(*args)
    -> { @callable.(*args) }
  end
end

module MyModule
  def self.func(a, b)
    a * b
  end
end

# @type var effect: ^() -> Array[String | Integer]
effect = Effect.new(MyModule.method(:func)).lazy([1, "a"] , 2)
puts "Effect: #{effect.()}" # => [1, "a", 1, "a"]

# ------------------- .rbs ---------------------
class Effect[<something>, S]
  def initialize: (^(*<something>) -> S) -> void
  def lazy: (*<something> args) -> (^() -> S)
end

interface _Prod
  def *: (Numeric) -> _Prod
end

module MyModule
  def self.func: (_Prod, Numeric) -> _Prod
end

Proposal

Based on this my proposal would be change the meaning of the type associated with the splat operator to refer to the entire array of arguments instead of each argument individually (i.e. make T in (*T) -> S mean the type of the array instead of the type of each individual element).

Please notice that with this change we can represent both uniform and heterogeneous lists of arguments (if someone wants to enforce uniform types, it can be done with (*Array[T]) -> S). Without the grammar is somehow incompatible with the main use of the splat operator nowadays (heterogeneous types).

Please notice the suggestion is also valid for keyword arguments: T in (**T) -> S would represent the entire type of the hash of keywords, not the type of each keyword value. If someone wants to enforce uniformity in the type, that could be done with `(**Hash[Symbol, T]) -> S.

Summary of the proposal: nowadays with the proposed change
uniform rest args (*A, **B) -> C (*Array[A], **Hash[Symbol, B]) -> C
heterogeneous rest args --impossible-- (*A, **B) -> C

Other implementations/languages

By the documentation, Sorbet doesn't seem to support heterogeneous rest args either, and apparently this decision was inspired by Scala. However Ruby syntax is different from Scala in that respect (disclaimer: I might be wrong here, I am not a Scala programmer). Apparently, according to this website, Scala prohibits the un-splatting of heterogeneous lists, and instead provides an alternative method to obtain virtually the same effect. This is not the case of Ruby... un-splatting of heterogeneous arrays is super-common (and the standard way we forward args between method calls), and I don't remember now any existing alternative for un-splatting... Therefore, I believe taking inspiration from Scala is not a good call on this matter.

We can instead look on how our friends in the Python community are dealing with this, since the grammar for splats in Python and Ruby is almost identical. Indeed they also opted by using uniform rest args, however this decision created a series of issues, and difficulties to produce type signatures even inside the standard library, that they are currently struggling to solve. Particularly it is impossible to annotate complex decorators - one of the pillars of Python's high-order functions - that modify/hide/augment the argument list of the decorated functions -- please notice that the decorator pattern in Python is similar to the pattern of wrapping blocks in Ruby/passing blocks around. The following links represent some of the issues pointed out by the Python community and the attempts to solve the limitation (mostly based on providing an alternative construct interpreted by the type checker).

It seems that main contributors of Python's reference type checker do appreciate that heterogeneous rest args are important and are debating for an alternative instrument to represent it, other than directly in the type annotation grammar to avoid breaking retro-compatibility. However, since the type grammar for stubs is Ruby is not consolidated yet, I see this as a great opportunity, so we can adopt a grammar that can handle both use cases (heterogeneous and uniform) and be more future proof.

soutaro commented 4 years ago

Hi @abravalheri,

Thank you for reporting this issue. The discussion about Python and Scala really helps me understand the possible solutions.

I'm not very open to add a new (complicated) feature to support heterogeneous rest args. However, I'm thinking of adding something to support delegation would be an option. Our understanding currently is that one of the most common use cases which require heterogeneous rest args is delegation: passing all of the arguments to another method. (As you showed us in your Effect example.)

Do you think supporting delegation makes sense?

abravalheri commented 4 years ago

Hi @soutaro, thank you very much for your feedback. I do think delegation makes a ton of sense and covers a lot of ground.

Just a question: does it mean that the signature of the outer method/proc have to be 100% the same of the encapsulated one? Or we would have some flexibility to add some control parameters consumed by the outer method and not forwarded?

I don't know if the following example makes sense or is already handled by the current algorithms, but this is my attempt of exemplify my previous paragraph (continuing with my previous example):

class Effect
  ...
  def attempt(times, *args) # or with kw: def attempt(*args, times: 3)
    count = 0
    begin
      return @callable.(*args)
    rescue => e
      count += 1
      if count < times
        puts "Attempt #{count} failed with exception `#{e.inspect}`, retrying..."
        retry
      else
        sleep 0.1 * count
        raise
      end
    end
  end
end
abravalheri commented 3 years ago

An update from the Python community:

After starting with a very similar approach (of assuming uniform rest/variadic args) and facing the problems listed in the first comment of this thread, they recently accepted a new proposal on how to express heterogeneous rest args, as described in https://www.python.org/dev/peps/pep-0612. Although possibly not implemented yet, it went through the formal process of approval and should arrive in future versions of the language/type checker.

abravalheri commented 3 years ago

Another update regarding the last comment.

The methodology for argument forwarding/heterogeneous varargs described by PEP612 is now implemented in Python 3.10 (currently in beta):

https://docs.python.org/3.10/library/typing.html#callable https://docs.python.org/3.10/library/typing.html#typing.ParamSpec https://docs.python.org/3.10/library/typing.html#typing.Concatenate

antstorm commented 2 years ago

Are there any updates on this? Would love to see this functionality in RBS

lloeki commented 1 year ago

it seems that the type of rest arguments can only be uniform (i.e. the same type for all the entries)

Disclaimer: I'm approaching this super naively.

Let's say you have:

# def foo: (*<something>) -> bool
def foo (*args)
  bar(*args)
end

# def bar: (String, Integer) -> bool
def bar(a, b)
  a.to_i == b
end

Then args is mandatorily an Array.

So for RBS to match the signature of bar it would need to check that args[0] is a String and args[1] is an Integer, as well as args size being 2.

But it occurred to me that is may difficult, if at all possible, statically right away. Consider:

def foo (*args)
  args.pop # this could be `pop`, or any other method that changes `args`
  bar(*args)
end

foo('2', 2)

Imagine we have per element typing then type would change args from Array(String, Integer) to Array(Integer).

In that case, currently args can only be specified as Array[String | Integer], and so <something> should be String | Integer:

def foo: (*String | Integer) -> bool

What may help though is taking inspiration from doing the checks dynamically, like it is done for variable type checks:

# def hexify: (?Integer) -> bool
def hexify(val)
  raise unless val.is_a?(Integer)

  val.to_s(16) # Steep is happy with to_s: (Integer) -> String because it sees the runtime check above
end

So one could (theoretically, Steep does not support that) do:

def foo (*args)
  # bar is known to accept (String, Integer)
  raise unless args[0].is_a?(String)
  raise unless args[1].is_a?(Integer)
  raise unless args.size == 2

  bar(*args) # Steep should theoretically be happy now
end

And from that it would presumably be able to infer that args[0] like id does for int.

And indeed it does when using intermediate variables:

def foo (*args)
  # bar is known to accept (String, Integer)
  a = args[0]
  b = args[1]
  raise unless a.is_a?(String)
  raise unless b.is_a?(Integer)

  bar(a, b) # Steep is happy now
end

I just wish Steep would allow eschewing the intermediate variables, especially with **kwargs.

Note that this is only possible because args is known not to be mutated concurrently, e.g this would cause problems:

def foo (*args)
  Thread.new { args.pop }.start

  # bar is known to accept (String, Integer)
  raise unless args[0].is_a?(String)
  raise unless args[1].is_a?(Integer)
  raise unless args.size == 2

  bar(*args) # Steep should theoretically be happy now
end

Anyway, that would presumably not help when such checks can't be done, like when contrary to bar @callable has an unknown signature, but that's a useful case I have been faced with and had to work around.

So, back to the original example, with the knowledge that heterogeneous splat args are actually not heterogeneous but a uniform union type, I changed it this way:

class Effect
  def initialize(callable)
    @callable = callable
  end

  def lazy(*args)
    -> { @callable.(*args) }
  end
end

module MyModule
  def self.func(a, b)
    a * b
  end
end

# @type var callable: ^(_Prod, Numeric) -> _Prod
callable = _ = MyModule.method(:func)
effect = Effect.new(callable).lazy([1, "a"] , 2)
# @type var effect_cast: ^() -> Array[String | Integer]
effect_cast = _ = effect
puts "Effect: #{effect_cast.()}" # => [1, "a", 1, "a"]

# ------------------- .rbs ---------------------
class Effect[T, S]
  @callable: ^(*T) -> S
  def initialize: (^(*T) -> S) -> void
  def lazy: (*T args) -> (^() -> S)
end

interface _Prod
  def *: (Numeric) -> _Prod
end

module MyModule
  def self.func: (_Prod, Numeric) -> _Prod
end

Now, there are whisker casts in there, isolating specific issues in local variables:

Anyway, this fails with:

uniform_rest.rb:20:20: [error] Cannot pass a value of type `^(::_Prod, ::Numeric) -> ::_Prod` as an argument of type `^(*T(1)) -> S(2)`
│   ^(::_Prod, ::Numeric) -> ::_Prod <: ^(*T(1)) -> S(2)
│
│ Diagnostic ID: Ruby::ArgumentTypeMismatch
│
└ effect = Effect.new(callable).lazy([1, "a"] , 2)

Instead, by typing callable with a union type this way:

# @type var callable: ^(*(_Prod | Numeric)) -> _Prod
callable = _ = MyModule.method(:func)

Then steep check passes and we achieve kind of reasonable, if verbosely annotated, type checking of heterogeneous splat args, although incomplete in index checking. At least it checks that the contained types are correct.

Changing the call to have a non Numeric argument:

# @type var callable: ^(*(_Prod | Numeric)) -> _Prod
callable = _ = MyModule.method(:func)
effect = Effect.new(callable).lazy([1, "a"] , :a)
# @type var effect_cast: ^() -> Array[String | Integer]
effect_cast = _ = effect
puts "Effect: #{effect_cast.()}" # => [1, "a", 1, "a"]

Produces the expected error:

uniform_rest.rb:20:46: [error] Cannot pass a value of type `::Symbol` as an argument of type `(::_Prod | ::Numeric)`
│   ::Symbol <: (::_Prod | ::Numeric)
│     ::Symbol <: ::_Prod
│
│ Diagnostic ID: Ruby::ArgumentTypeMismatch
│
└ effect = Effect.new(callable).lazy([1, "a"] , :a)

To enforce the type of the first argument, we'd be required to actually enforce it on the callable:

# @type var callable: ^(*(Array[String | Integer] | Numeric)) -> Array[String | Integer]
callable = _ = MyModule.method(:func)
effect = Effect.new(callable).lazy([:a, "a"] , 2)
puts "Effect: #{effect.()}" # => [1, "a", 1, "a"]

Which produces:

uniform_rest.rb:20:35: [error] Cannot pass a value of type `::Array[(::Symbol | ::String)]` as an argument of type `(::Array[(::String | ::Integer)] | ::Numeric)`
│   ::Array[(::Symbol | ::String)] <: (::Array[(::String | ::Integer)] | ::Numeric)
│     ::Array[(::Symbol | ::String)] <: ::Array[(::String | ::Integer)]
│       (::Symbol | ::String) <: (::String | ::Integer)
│         ::Symbol <: (::String | ::Integer)
│           ::Symbol <: ::String
│             ::Object <: ::String
│               ::BasicObject <: ::String
│
│ Diagnostic ID: Ruby::ArgumentTypeMismatch
│
└ effect = Effect.new(callable).lazy([:a, "a"] , 2)

As mentioned, it odoes allow one to pass this though:

effect = Effect.new(callable).lazy(1.2, [1, "a"])

But at least we're halfway there and check for unexpected types.


Tangent: back to this:

I don't see a way to go back up to Array[String | Integer] since we downcasted to _Prod. I do feel this may be out of scope though, as IIUC the core issue in the example seems to be about propagating the delegated method args up the generic propagator, and defining Effect types in a generic way, not defining the args from the return value.

I've been toying with generics for this:

# @type var callable: ^(*(_Prod[Array[String | Integer]] | Numeric)) -> Array[String | Integer]
callable = _ = -> (a, b) { a * b }
effect = Effect.new(callable).lazy([:a, "a"] , 2)
puts "Effect: #{effect.()}" # => [1, "a", 1, "a"]

# ------------------- .rbs ---------------------
interface _Prod[T]
  def *: (Numeric) -> T
end

Which, if you are in a context where at somepoint you can specify the T in _Prod[T], such as:

class MyClass[T]
  def initialize: (T) -> void
  def make_callable: () -> ^(*(_Prod[T] | Numeric)) -> T
  def reticulate_splines: ...
end

Could be useful to handle the return type as well in a mostly generic fashion and get rid of the last type annotation on callable.

lloeki commented 1 year ago

Discussion about preferring to tackle that as delegation notwithstanding, regarding the syntax of the proposed change:

 (*Array[A], **Hash[Symbol, B]) -> C

This does not seem to tackle enforcing position (or key for kwargs) so it seems to be equivalent to a uniform union type.

I think I would prefer mimicking records and tuples, which seem to map better with *args and **kwargs:

(*[A, AA, AAA], **{ foo: B, bar: BB}) -> C

In addition it seems to be better at enforcing:

Which tuples and records seem to naturally achieve.

This would probably not cover variadic arguments though.

abravalheri commented 1 year ago

Hi @lloeki, using records and tuples do sound like an improvement over the original proposal.

I think, however, they are not incompatible. In my original post, I added a table (slightly modified version):

  nowadays with the proposed change
uniform rest args (*A, **B) -> C (*Array[A], **Hash[Symbol, B]) -> C
heterogeneous rest args --impossible-- (*D, **E) -> F

In the second row, we can assume that D and E are aliases for D = [A, AA, AA] and E = {foo: B, bar: BB}. I think that makes it easier to understand...

In the conceptual level, nowadays all variadic/keyword arguments are assumed to be from the same time. With the change, the type itself would be "splattered", which allows for records and tuples.