substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.21k stars 161 forks source link

Rationalize/consolidate type grammar #686

Open jacques-n opened 3 months ago

jacques-n commented 3 months ago

There are several grammars for the same thing in Substrait subprojects.

  1. There was originally a type parser built in substrait-java
  2. Later, someone took that and reimplemented it in substrait-validator
  3. Later again, substrait-cpp created their own parser without antlr

We need to rationalize these two parsers (and any others that exist) and create a single parser in the core substrait repository.

I have no idea if they agree in behavior. I believe the java one also supports parsing to the other proto type definitions that aren't used in plans today (were intended for consumers informing producers of UDF behavior).

It's reasonably for each language binding to have their own specific implementations using a common grammar but we should avoid having distinct grammars across subprojects.

EpsilonPrime commented 3 months ago

There are also two in substrait-cpp.

jacques-n commented 3 months ago

picard

Type parsers? Can you provide pointers to them?

EpsilonPrime commented 3 months ago

This has support for types and parameterized types: https://github.com/substrait-io/substrait-cpp/tree/main/src/substrait/type

There's a wrapper with slight improvements for text parsing here: https://github.com/substrait-io/substrait-cpp/blob/1dbf98b548de3ef11cac2a42075b87f57e7004b9/src/substrait/textplan/parser/SubstraitPlanTypeVisitor.cpp#L24

mbrobbel commented 3 months ago

create a single parser

Shouldn't we add the grammar definition to the core repo and keep the implementations in the language repos?

jacques-n commented 3 months ago

Shouldn't we add the grammar definition to the core repo and keep the implementations in the language repos?

Yes, this was what this ticket is trying to suggest. I'll update description make clearer.