Generate `toBuilder` overrides with covariate return type

aabdagic commented 5 years ago

At the moment toBuilder methods are only defined on the GeneratedMessage, which means they are always of type GeneratedMessage. This works poorly when no-implicit-casts is used as it requires explicit casting to the actual type.

sigurdm commented 5 years ago

I see the problem. I had hoped to avoid generating toBuilder for each subclass.

The real solution as I see it is to make GeneratedMessage generic

class GeneratedMessage<M extends GeneratedMessage<M>> {
  ...
  M createEmptyInstance();

  M toBuilder() {
    final result = createEmptyInstance();
    result._fieldSet._shallowCopyValues(_fieldSet);
    return result;
  }
 ...
}

The message classes would then be like:

class Foo extends GeneratedMessage<Foo> {
  ...
  Foo createEmptyInstance() => new Foo();
  ...
}

I don't really know how this would affect code size and performance, and how hard the migration would be.

I guess the generic type would take up a field in each message instance. And that there would be inserted implicit checks a number of places.

@rakudrama, @nichite any insights?

Possibly it is not too bad. built_value does this already.

(https://github.com/google/built_value.dart/blob/master/built_value/lib/built_value.dart#L11)

eernstg commented 5 years ago

Just as an aside from a language evolution point of view, this would be a good example of a situation where a proper self-type and the ability to require re-implementation would work well:

class GeneratedMessage {
  ...
  This createEmptyInstance();
}

class Foo extends GeneratedMessage {
  ...
  This createEmptyInstance() => new Foo();
}

This would make it possible to know at call sites that the returned result has the same type as the receiver, and given that the static type of the receiver is a subtype of the dynamic type this would always allow us to get an optimal and safe approximation of the actual type of object returned.

The implementation Foo.createEmptyInstance is not safe: there will be a dynamic check when a Foo is returned), and if a subclass SubFoo inherits this implementation then this check will fail.

It wouldn't be hard to make it safe, however. We could have support for a kind of marker (for instance, we could annotate createEmptyInstance in Foo with @notInheritable) which would indicate that the implementation of createEmptyInstance can not be inherited by concrete subclasses of Foo, they must have their own overriding implementation if the method otherwise inherited has this marker. It's OK for an abstract subclass to omit this overriding.

(With the code generation approach there would have to be an overriding declaration in every subtype of GeneratedMessage, presumably abstract in abstract classes, or the typing would not be optimal at call sites.)

If this is handled using metadata like @notInheritable then the implementation is known to avoid the dynamic errors if there are no diagnostics about violations of the @notInheritable constraint. If it's made part of the language then this can be enforced. So this is a case where it is a viable option to do it with metadata, but performance could be better if the compiler can trust the returned value to have the required type, which is a known property of a program where all required overriding declarations are present, as would be the case with a proper language mechanism.

rakudrama commented 5 years ago

@sigurdm We should try to get to zero methods in the generated class that are overrides of the base class. This opens up the possibility of optimizations where the derived class is implemented as the base class.

I am worried about the code size impact of making GeneratedMessage generic, but it does seem like the right thing to do. If we go in that direction, I would make BuilderInfo generic too, and store create() in the BuilderInfo so that we could have one definition of createNewInstance:

class BuilderInfo<M extends GeneratedMessage<M>> {
  final M Function() _createFunction;
  BuilderInfo(this._createFunction, ...) {...}
   ...
}

class GeneratedMessage<M extends GeneratedMessage<M>> {
  final BuilderInfo<M> info_;
  final FieldSet _fieldSet;

  GeneratedMessage(this.info_) : _fieldSet = ... info_ ...;

  M createNewInstance() => (info_._createFunction)();
  M clone() => createNewInstance.mergeFrom(this);
}

class FooMessage extends GeneratedMessage<FooMessage> {
  static BuilderInfo<FooMessage> _i = BuilderInfo<FooMessage>(create, ....);

  static FooMessage create() => FooMessage._();
  FooMessage._() : super(_i);
}

My concern about code size is around the fact that FooMessage implements GeneratedMessage where M=FooMessage. Currently the binding of M in this case is done as an instance method of FooMessage. I hope to work on making the types less connected to the JavaScript constructors so this might be fixed in the (not near) future.

@eernstg The 'this-type' seems like a special case of M above. I am in favor of making common things easier. Does work in composed scenarios, e.g:

class GeneratedMessage {
  final BuilderInfo<This> info_;
  This createNewInstance() => (info_._createFunction)();
}

sigurdm commented 5 years ago

Actually there is a slight difference between M and This as I read it. This can only ever be the leaf-type itself - while M can be another type:

FooMock extends GeneratedMessage<Foo> {
  Foo createNewInstance() => Foo();
  ...
}

You could not do that with This.

I will look into a prototype of a Generic GeneratedMessage to see how it will work out.

sigurdm commented 5 years ago

I created a pull request to peruse the option - but didn't do experiments with it. On the surface level I quite like it though. It feels like the right thing to do.

eernstg commented 5 years ago

@rakudrama wrote:

The 'this-type' seems like a special case of M above.

It's not quite a special case. If I understand you correctly then we could say that the approach based on M extends GeneratedMessage<M> requires a redeclaration of that type variable (and, for non-leaf types, its constraint) down every path in the subtype graph, but we will live with that (especially when it's generated code). So if we just get the same properties, we could say that it's a special case.

But we don't get the same properties—in particular, the compiler cannot trust all subtypes to follow this programming idiom, which means that this is not guaranteed to have the type which is denoted by the "self" type variable (re-loading this page, I can see that @sigurdm mentioned the same kind of situation):

abstract class C1<X extends C1<X>> { X me(); } // Introduce `X` as a This type.
abstract class C2<X extends C2<X>> extends C1<X> {} // Maintain the This type variable.
class D extends C2<D> { D me() => this; } // As intended: `this` has type `D`.
class E extends C2<D> { E me() => this; } // Oops!

So it's a special case in the same sense as declaring a variable x to have type int is a special case of declaring it as num: The latter will allow x to have all the values allowed by the former, but the compiler can't rely on having only those things. So we'll make x.isEven an error.

About the BuilderInfo:

class GeneratedMessage {
  final BuilderInfo<This> info_;
  GeneratedMessage(this.info_);
  This createNewInstance() => (info_._createFunction)();
}

When a GeneratedMessage or a subtype thereof is created using a generative constructor, the instance has an exact type, and this means that it can be checked statically that the given info_ constructor argument has the required type. So we can create the generated message instance safely. Similarly, (info_._createFunction) is known to return an object of type This, so there is no need for a dynamic type check in the body of createNewInstance, and at the createNewInstance call site it is known statically that the returned result has a type which is a subtype of the static type of the receiver, with no need for redeclarations of anything in subtypes of GeneratedMessage. So it does work.

davidmorgan commented 5 years ago

Any update on this please?

I would imagine that requiring casts whenever someone uses toBuilder is more of a performance problem than using generics--unless people choose to not use toBuilder :)

I ended up here because I commented on what I thought were unnecessary type annotations in a code review--turns out they are needed after all, as they are casts.

long1eu commented 4 years ago

Any updates?

osa1 commented 2 years ago

GeneratedMessage.freeze also has the same problem: https://github.com/google/protobuf.dart/pull/631#issuecomment-1118904769

sigurdm commented 2 years ago

Some things can be done with extensions like we do with rebuild. Because extension methods are really static methods they can be generic in the message-type:

Here is how we do with rebuild:

extension GeneratedMessageGenericExtensions<T extends GeneratedMessage> on T {
  /// Apply [updates] to a copy of this message.
  ///
  /// Throws an [ArgumentError] if `this` is not already frozen.
  ///
  /// Makes a writable shallow copy of this message, applies the [updates] to
  /// it, and marks the copy read-only before returning it.
  T rebuild(void Function(T) updates) {
    if (!isFrozen) {
      throw ArgumentError('Rebuilding only works on frozen messages.');
    }
    final t = toBuilder();
    updates(t as T);
    return t..freeze();
  }

Can we write:

extension GeneratedMessageGenericExtensions<T extends GeneratedMessage> on T {
  T toBuilder() {
    final result = createEmptyInstance() as T;
    result._fieldSet._shallowCopyValues(_fieldSet);
    return result;
  }
}

Then we "push the cast one more step back".

Furthermore we could make "createEmptyInstance()" protected, and make an extension with the right type that does the cast.

abstract class GeneratedMessage {
  @protected
  GeneratedMessage $createEmptyInstance();
  [...]
}

class Foo extends GeneratedMessage {
  @override
  Foo $createEmptyInstance();
}

extension GeneratedMessageGenericExtensions<T extends GeneratedMessage> on T {
  T createEmptyInstance() => $createEmptyInstance as T;

  T toBuilder() {
    final result = createEmptyInstance();
    result._fieldSet._shallowCopyValues(_fieldSet);
    return result;
  }

  T rebuild(void Function(T) updates) {
    if (!isFrozen) {
      throw ArgumentError('Rebuilding only works on frozen messages.');
    }
    final t = toBuilder();
    updates(t);
    return t..freeze();
  }
}

davidmorgan commented 2 years ago

Fun stuff.

I looked briefly at moving GeneratedMessage methods to extension methods nearly a year ago; I was mostly curious as to whether cutting down on dynamic proto methods would improve compilation performance. (Since we've seen issues with proto compile performance due to number of generated subclasses). I got stuck on a CFE bug (since resolved) and never finished the investigation.

One thing to note is that it's disruptive to code if extension methods are used as they need a new 'import' to get the extension--most code that uses protos does not import it. This can be mitigated by adding an 'export' to all generated proto source, and that mostly works, but there can still be cases of protos being used without importing any proto where a new import will need adding.

sigurdm commented 2 years ago

One thing to note is that it's disruptive to code if extension methods are used as they need a new 'import' to get the extension--most code that uses protos does not import it. This can be mitigated by adding an 'export' to all generated proto source, and that mostly works, but there can still be cases of protos being used without importing any proto where a new import will need adding.

Good point! Also there are semantic differences between a static method and an instance method - so we might prefer new names for these while deprecating the old.

robert-j-webb commented 2 years ago

Any updates on this?

google / protobuf.dart

Generate `toBuilder` overrides with covariate return type #177