Proposal for a `dotOperator` replacement

n0bra1n3r commented 3 years ago

Description

The result of a defined type-bound operator is called when a call to an undeclared routine is encountered at compile time.
Effectively replaces the experimental dotOperator feature.
May serve as an alternative to term rewriting macros for some things (recently discussed in the forums).

Illustration

type Proxy = object

macro `=call`(obj: Proxy, op: untyped, args: varargs[typed]) =
  echo op.repr
  # implementation
  ...

let proxy = Proxy()

discard proxy.test # prints 'test'
proxy.test = 1 # prints 'test='
proxy.test += 1 # prints 'test+='
test(proxy, 1) # prints 'test'
discard proxy + 1 # prints '+'

Rationale

Very useful for adhoc interop with other languages and remote resources.
Leverages Nim's existing features such as UFCS, command invocation syntax, and operator invocation to naturally provide custom behavior.

Definition of Terms

I use the term "routine" here to mean anything that's callable, including procs, operators, templates, and macros. I use the term "procedure" to refer to anything that has a function pointer like procs, funcs, and iterators.

Usecases

Interop

type CppObj {.importcpp header:"CppObj.h".} = object

template `=call`(obj: CppObj, op: untyped) =
  proc op(_: CppObj) {.importcpp.}

let cppObj = CppObj()
cppObj.cppMethod()

Note that this technique can also be used to call arbitrary functions from DLLs, allowing runtime interop and custom REPL-like functionality.

Field swizzling (GLSL-like field access)

type Vector3 = object
  x, y, z: float
type Vector2 = object
  x, y: float

macro `=call`(vec: Vector3, op: untyped) =
  let obj = nnkObjConstr.newTree(ident("Vector" & $len($op)))
  let fields = vec.getType[2]

  for i, field in $op:
    obj.add(newColonExpr(fields[i], newDotExpr(vec, ident($field))))

  template makeProc(op, obj) =
    proc op(vec: Vector3): auto = obj

  result = getAst makeProc(op, obj)

let vec = Vector3(x: 1, y: 2, z: 3)

echo vec.zy # prints '(x: 3.0, y: 2.0)'
echo vec.yzx # prints '(x: 2.0, y: 3.0, z: 1.0)'

A common usecase for doing shader- and graphics-related math.

Object proxies

type Proxy[T] = object
  obj: T

template `=call`[T](proxy: Proxy[T], op: untyped): untyped =
  proc op(p: Proxy[T]): auto {.inline.} =
    # intercept field access and/or procedure calls
    ...
    result = op(p.obj)

let proxy = Proxy[string](obj: "Hello")

echo $proxy # prints 'Hello'

This usecase could lessen the need for converters, since most operations can be forwarded through a proxy to the underlying object. This could also allow things like mapping field access to entity component access in an ECS.

Remote function invocation

type Network = object

template `=call`(obj: Network, op: untyped) =
  # perform operation to execute remote function or API
  ...

doStuffOverNetwork(Network())

This could be used to access arbitrary columns in a remote database for example, or execute an API with a nice syntax.

Postfix-like operators

type Obj = object

template `=call`(obj: Obj, op: untyped) =
  when astToStr(op)[-1] == '?':
    # generate proc to do nullable stuff
    ...
  else:
    ...

let obj = Obj()
let valueOrNil = obj.optionalField?.field

This allows e.g. Swift-like optional syntax without special-casing ?. to have the same precedence as ..

Extensions to computed properties

type Obj = object

template `=call`(obj: Obj, op: untyped, arg: int) =
  case astToStr(op)[-2, -1]:
  of "+=":
    ...
  of "-=":
    ...
  of "*=":
    ...

let obj = Obj()
obj.field += 1

Allows overriding operations on computed properties. Current Nim has field= and .=, which allows overriding only obj.field = 1, but not obj.field += 1.

Comparison with Existing Solutions

The experimental dotOperator facilitates many of the usecases in this RFC. Some disadvantages of this approach are:
- The meaning of . is overloaded:
- default: object field access
- default: sugar for routine invocation
- dotOperator: arbitrary code can be generated that may or may not result in field access or a routine invocation.
- . can be declared in any module and importing a module with this operator defined could add surprises. =call must be declared in the same module as the type it operates on which avoids this.
- . becomes special and can do things that UFCS and command invocation syntax cannot in the context of routine invocation.
- Allows a limited number of operator overloads. For example, .= can be overloaded but .+= cannot.
- Seems to be emulating ternary operators (e.g. .= and .()), which don't exist in Nim. This leads to a lot of special casing and perhaps complexity in the compiler. With this RFC, operators like += become just an argument to =call, and .() becomes implicitly enabled via UFCS mechanics.
Term-rewriting macros could be modified to provide similar or even greater flexibility as mentioned in this comment. However it is probably not an ideal solution, since TRMs weren't designed for these usecases.
Dot-like operators like the one in RFC https://github.com/nim-lang/RFCs/issues/341 has the advantage of enforcing that a symbol come after . (.?, .!, etc.). This provides a visual cue to warn of magic that may happen as a result of calling this operator. It still has many of the disadvantages of dotOperators, plus the following:
- Assumes that all dot-like operators are intended to be used for property access (or similar). This may rule out usecases like matrix1 .* matrix2 for matrix dot or element-wise multiplication, or Matlab-inspired operators like those used in nim-glm.
- Essentially adds a completely new set of operators aside from the ones already in Nim, with special rules to make them act like ..

Proposed Mechanics

=call must be defined in the same module as the type it operates on, similar to =destroy and family.
The compiler will attempt to call =call when a call to an undefined routine is encountered, and then will try to call the routine. It may be desirable to enforce that the routine generated by =call is a procedure definition or overloads of a function.

proxy.property = 1 # `property=` is an undeclared routine
# If `=call` is defined for `typeof(proxy)`, the above effectively expands to:
# `=call`(proxy, `property=`, 1)
# `property=`(proxy, 1)

undeclaredRoutine(proxy) # `undeclaredRoutine` is an undeclared routine
# If `=call` is defined for `typeof(proxy)`, the above effectively expands to:
# `=call`(proxy, undeclaredRoutine)
# undeclaredRoutine(proxy)

proxy + 1 # `+` is an undeclared routine
# If `=call` is defined for `typeof(proxy)`, the above effectively expands to:
# `=call`(proxy, `+`, 1)
# `+`(proxy, 1)

The compiler will greedily consider any non-. operators after the undeclared routine call as part of the routine name passed to =call.

proxy.property += 1 # `property` is an undeclared routine
# If `=call` is defined for `typeof(proxy)`, the above effectively expands to:
# `=call`(proxy, `property+=`, 1)
# `property+=`(proxy, 1)

Related Literature

Dlang implements a similar concept for its operators, but instead of passing an AST to its "template methods", it passes a compile-time string containing the invoked operator. It is then up to the programmer to define and implement functionality for each possible operator passed to these templates.

Note that Dlang template methods actually expand to the implementation of a method plus the invocation of that method upon use, in the same way the proposed =call does.

In the current or upcoming Nim, there are features that may have similar mechanics to this one, namely:

type-bound operators like =destroy and family
for-loop macros
case statement macros
items iterator and family

Important Links

sample playground: https://play.nim-lang.org/#ix=3b6B
forum post: https://forum.nim-lang.org/t/7752
dot operators: https://nim-lang.org/docs/manual_experimental.html#special-operators-dot-operators
term rewriting macros: https://nim-lang.org/docs/manual_experimental.html#term-rewriting-macros
related RFC: https://github.com/nim-lang/RFCs/issues/341
Dlang's operator overloading: https://dlang.org/spec/operatoroverloading.html

Varriount commented 3 years ago

So, just to confirm my interpretation:

Currently, dot operators can produce any result, possibly breaking the semantic expectation the . syntax has.
The proposal is to essentially refine these operators so that they are expected to produce a routine (or is it only procedures?) of some sort, and have the compiler produce code that calls the routine.

n0bra1n3r commented 3 years ago

@Varriount Yes for both points.

The proposal also goes a bit further than your second point though. It tries to do away with special casing of . (or any other operator), and instead allows the programmer to generate a procedure if it is not declared when a call to it is encountered by the compiler. This call can be in any syntax that Nim supports (method call syntax, command invocation syntax, operator syntax, etc.). I tried to describe this under the Proposed Mechanics section a bit.

Please let me know if anything is not clear. I really want to make this a good RFC. Thanks!

Araq commented 3 years ago

The RFC is well-written but there is a fundamental design tension between "let's have custom dot-like operators" and "the behavior of the dot notation is overridable". I much prefer "custom dot-like operators" which rules out "Object Proxying" entirely, no matter the details of how it's done.

Having said that, a design should probably focus on allowing convenient user definable smart pointers.

n0bra1n3r commented 3 years ago

@Araq I see. So to clarify, object proxying is something you want to only be possible in specific scenarios (like for implementing smart pointers)? Or is it that object proxying is evil, full stop? The RFC was based on the idea that you could proxy types from other languages and perform operations on them just like you would with Nim types, without specifying every detail of the implementation.

I guess my gripe with dot operator overloading is that it changes the meaning of . (or any variation of it) from "just another way to call functions or to access object fields" to "a special way to generate arbitrary code, plus the other stuff". I'm not sure this RFC completely solves that either though, but at least the result of any . operation is guaranteed to be the result of a function call (or field access).

Varriount commented 3 years ago

The RFC is well-written but there is a fundamental design tension between "let's have custom dot-like operators" and "the behavior of the dot notation is overridable".

Another way to look at this might be: Term-rewriting macros are either not powerful enough, or not usable enough (from an ease-of use perspective) to apply to the expressions currently targeted by dot operators.

Araq commented 3 years ago

I guess my gripe with dot operator overloading is that it changes the meaning of . (or any variation of it) from "just another way to call functions or to access object fields" to "a special way to generate arbitrary code, plus the other stuff".

That's a good way to put it, here is another one: Turning a.b into a["b"] (dynamic access that can fail) is what I dislike most -- you get most of the problems of dynamic typing within Nim.

n0bra1n3r commented 3 years ago

you get most of the problems of dynamic typing within Nim.

Yes. I would dare say that any operator that has . in it (.?, .!, etc.) like what is proposed in RFC https://github.com/nim-lang/RFCs/issues/341 has this weakness. This RFC also carries some of those disadvantages, but I would go so far as to say that this one is superior to dot-like operators because of the limitations it imposes, as well as the flexibility it allows.

One more important advantage of this RFC (aside from what's already mentioned) over dot/dot-like operators is that operators can be declared in any module, separate from the type they operate on; =call in this RFC however would behave similarly to =destroy, which has to be declared in the same module as the type it operates on. This makes it clear that there is a certain amount of magic involved when working with a type that has =call.

Araq commented 3 years ago

=call is an alien beast though, the other type bound operators are lifted automtically, you define =copy for CustomObj and it's not skipped for a tuple of CustomObj, =call has no such lifting requirements.

n0bra1n3r commented 3 years ago

Well, can't argue against that... 😂 Sounds like a showstopper if enforcing same-module declaration for =call and its operand can't be implemented without hacks.

Araq commented 3 years ago

It can easily be implemented either way. But we can also easily enforce that dot operators must be reside in the same module as the type they belong to. Don't worry too much about the implementation, we should focus on getting the design right.

nim-lang / RFCs