substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.16k stars 150 forks source link

Rationalize/consolidate type grammar #686

Open jacques-n opened 3 weeks ago

jacques-n commented 3 weeks ago

There are several grammars for the same thing in Substrait subprojects.

  1. There was originally a type parser built in substrait-java
  2. Later, someone took that and reimplemented it in substrait-validator
  3. Later again, substrait-cpp created their own parser without antlr

We need to rationalize these two parsers (and any others that exist) and create a single parser in the core substrait repository.

I have no idea if they agree in behavior. I believe the java one also supports parsing to the other proto type definitions that aren't used in plans today (were intended for consumers informing producers of UDF behavior).

It's reasonably for each language binding to have their own specific implementations using a common grammar but we should avoid having distinct grammars across subprojects.

EpsilonPrime commented 3 weeks ago

There are also two in substrait-cpp.

jacques-n commented 3 weeks ago

picard

Type parsers? Can you provide pointers to them?

EpsilonPrime commented 3 weeks ago

This has support for types and parameterized types: https://github.com/substrait-io/substrait-cpp/tree/main/src/substrait/type

There's a wrapper with slight improvements for text parsing here: https://github.com/substrait-io/substrait-cpp/blob/1dbf98b548de3ef11cac2a42075b87f57e7004b9/src/substrait/textplan/parser/SubstraitPlanTypeVisitor.cpp#L24

mbrobbel commented 3 weeks ago

create a single parser

Shouldn't we add the grammar definition to the core repo and keep the implementations in the language repos?

jacques-n commented 3 weeks ago

Shouldn't we add the grammar definition to the core repo and keep the implementations in the language repos?

Yes, this was what this ticket is trying to suggest. I'll update description make clearer.