substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.19k stars 155 forks source link

YAML: OMG #448

Closed elharo closed 1 year ago

elharo commented 1 year ago

YAML is not well defined, reliably parseable, or authorable.

See https://noyaml.com/ among many others

Ideally substrait should not use YAML at all. If it's going to use something like the existing format for extensions, it needs to define the syntax of that format extremely carefully without reference to YAML. Or it could us e a better defined, less dangerous syntax like JSON or XML.

westonpace commented 1 year ago

The syntax of the document is defined with a JSON schema here: https://github.com/substrait-io/substrait/blob/main/text/simple_extensions_schema.yaml

Given that multiple libraries are already parsing these documents (substrait-java, substrait-cpp, the substrait validator, and there is a tool that parses the YAML to present documentation) it would not be a trivial change to change the format at this point.

The content is fairly well structured and most users will not have to author YAML. I'm not convinced the usage of YAML is a risk to the project's success.