substrait-io / substrait-java

Apache License 2.0
77 stars 72 forks source link

Spark-Substrait: Functions with nullable marker not understood #289

Closed mbwhite closed 2 months ago

mbwhite commented 2 months ago

If I have a plan that contains a reference to the extension:

extensions {
  extension_function {
    extension_uri_reference: 1
    function_anchor: 1
    name: "not:bool?"
  }
}

This is not understood by the Spark library; the ? suffix is a later spec version after the original Spark-Substrait implementation.

Parsing a plan with this type of extension gives:

 Unexpected scalar function with key not:bool?. The namespace /functions_boolean.yaml is loaded but no scalar function with this key found.
        at io.substrait.extension.SimpleExtension$ExtensionCollection.getScalarFunction(SimpleExtension.java:636)
        at io.substrait.extension.AbstractExtensionLookup.getScalarFunction(AbstractExtensionLookup.java:24)
        at io.substrait.expression.proto.ProtoExpressionConverter.from(ProtoExpressionConverter.java:113)
        at io.substrait.relation.ProtoRelConverter.newFilter(ProtoRelConverter.java:146)
        at io.substrait.relation.ProtoRelConverter.from(ProtoRelConverter.java:70)
        at io.substrait.relation.ProtoRelConverter.newProject(ProtoRelConverter.java:371)
        at io.substrait.relation.ProtoRelConverter.from(ProtoRelConverter.java:88)
        at io.substrait.plan.ProtoPlanConverter.from(ProtoPlanConverter.java:40)
mbwhite commented 2 months ago

FYI @andrew-coleman

vbarua commented 2 months ago

Do you have an full example of this plan? I think the ? suffix is no longer a thing. It's not present in https://substrait.io/extensions/#function-signature-compound-names anymore.

mbwhite commented 2 months ago

@vbarua not so much the function name but the arguments...

Still here https://github.com/substrait-io/substrait/blob/aab01537d750f6fb296a7fe8e18299d04ecc7a83/extensions/functions_boolean.yaml#L29

Unless I've got completely the wrong understanding

vbarua commented 2 months ago

Creating a plan from Isthmus, you can see that the name of the function is not:bool regardless of the nullability. The name used in the extensionFunction is the substrait.io/extensions#function-signature-compound-names.

./isthmus --create "CREATE TABLE temp (b BOOLEAN, nb BOOLEAN)" "SELECT NOT b, NOT nb FROM temp"
{
  "extensionUris": [{
    "extensionUriAnchor": 1,
    "uri": "/functions_boolean.yaml"
  }],
  "extensions": [{
    "extensionFunction": {
      "extensionUriReference": 1,
      "functionAnchor": 0,
      "name": "not:bool"
    }
  }],
  "relations": [{
...

If I recall correctly ? in the YAML extension

    impls:
      - args:
          - value: boolean?
            name: a
        variadic:
          min: 0
        return: boolean?

indicates that the input argument to the function can be nullable

mbwhite commented 2 months ago

ah 💡 - makes sense... so my problem is the other way around - what was creating the plan was adding in the ? that isn't needed.

Thanks @vbarua ... nothing to see here @andrew-coleman ... move along :-)