substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.14k stars 148 forks source link

Support alias for rel #571

Closed my-vegetable-has-exploded closed 1 month ago

my-vegetable-has-exploded commented 9 months ago

Since substrait always use field indices to refer to fields,it may cause some problem when revert from substrait to logical plan.

Taking datafusion-6867 for example, datafusion use table+field to distinguish fields.

SELECT d1.a FROM data d1 JOIN data d2 ON d1.e <> d2.e

Without alias for rel, it can't distinguish between left input and right input in self join.

I think alias for rel is not complicated to implement and improves ease of use.

A simple design is given below.

message AliasRel {
  RelCommon common = 1;
  Rel input = 2;
  // alias name
  string names = 3;
  substrait.extensions.AdvancedExtension advanced_extension = 10;
}
westonpace commented 9 months ago

Thanks for creating this issue. The concept of aliases has come up in community discussions in the past but I can't find a good concrete issue for this idea.

I think it would be a good idea to add aliases as an "optional extension" or an "official hint" of sorts. We have also talked in the past about having some way to annotate parts of the plan with extra information.

My concern with your proposed AliasRel is that it will cause the plan to become unreadable by engines that do not understand it. Or, to put it another way, it forces all consumers to handle aliases whether they care about them or not.

I think something using AdvancedExtension might be an easier path forward. In particular, the enhancement type of advanced extension specifically says that the information can be ignored by consumers if they choose to do so. It's also something that could be done without any change to the Substrait spec.

I'm also going to CC @jacques-n who has talked about "annotations" before and I think was very close to writing up some kind of proposal / idea doc at one point in time.

my-vegetable-has-exploded commented 9 months ago

Thanks for your patience!

I think it would be a good idea to add aliases as an "optional extension" or an "official hint" of sorts. We have also talked in the past about having some way to annotate parts of the plan with extra information.

That's really a better idea.

I think something using AdvancedExtension might be an easier path forward. In particular, the enhancement type of advanced extension specifically says that the information can be ignored by consumers if they choose to do so. It's also something that could be done without any change to the Substrait spec.

Thank you for your advice. Do you mean that we can put the alias name into AdvancedExtension? By the way, the comment shows that enhancement cannot be ignored by a consumer which iis a little different from your description.

  // An enhancement alter semantics. Cannot be ignored by a consumer.
  google.protobuf.Any enhancement = 2;
westonpace commented 9 months ago

Do you mean that we can put the alias name into AdvancedExtension?

Yes

By the way, the comment shows that enhancement cannot be ignored by a consumer which iis a little different from your description.

Good catch, you are right, I got it backwards. optimization is the one to use.