substrait-io / substrait-java

Apache License 2.0
75 stars 72 forks source link

The namespace /functions_arithmetic.yaml is loaded but no aggregate function with this key was found #156

Closed JosepSampe closed 1 year ago

JosepSampe commented 1 year ago

I'm trying to load into calcite an ibis-generated substrait plan, but it seems it is unable to map the sum (and possibly others) function.

On the python side I have this code that ends up creating an ibis.prop file:

from ibis_substrait.compiler.core import SubstraitCompiler
from google.protobuf import json_format
import ibis

if __name__ == '__main__':
    t = ibis.table(
        [("a", "string"), ("b", "float"), ("c", "float"), ("d", "int64"), ("e", "int64")],
        "t",
    )

    expr = t.group_by(["a", "b"]).aggregate([t.c.sum().name("suma")]).select("b", "suma")
    # expr = t.group_by(["a", "b"]).select("b") #  <- The plan generated from this expr is loaded properly in calcite
    ibis.show_sql(expr)

    compiler = SubstraitCompiler()
    plan = compiler.compile(expr)

    print(json_format.MessageToJson(plan))

    with open("ibis.proto", "wb") as f:
        f.write(plan.SerializeToString())

On the java side, I'm doing this:

SimpleExtension.ExtensionCollection extensions = SimpleExtension.loadDefaults();
FileInputStream fis = new FileInputStream("ibis.proto");
io.substrait.proto.Plan protoPlanLoaded = io.substrait.proto.Plan.parseFrom(fis);
ProtoPlanConverter protoPlanConverter = new ProtoPlanConverter(extensions);
Plan sPlan = protoPlanConverter.from(protoPlanLoaded); // <- It fails here
...

The exception I get is this:

Exception in thread "main" java.lang.IllegalArgumentException: Unexpected aggregate function with key sum. The namespace /functions_arithmetic.yaml is loaded but no aggregate function with this key was found.
    at io.substrait.extension.SimpleExtension$ExtensionCollection.getAggregateFunction(SimpleExtension.java:693)
    at io.substrait.extension.AbstractExtensionLookup.getAggregateFunction(AbstractExtensionLookup.java:35)
    at io.substrait.relation.ProtoRelConverter.newAggregate(ProtoRelConverter.java:378)
    at io.substrait.relation.ProtoRelConverter.from(ProtoRelConverter.java:72)
    at io.substrait.relation.ProtoRelConverter.newProject(ProtoRelConverter.java:341)
    at io.substrait.relation.ProtoRelConverter.from(ProtoRelConverter.java:84)
    at io.substrait.plan.ProtoPlanConverter.from(ProtoPlanConverter.java:40)

Checking the plans, I can see that the plan generated using ibis contains this:

 "extensionUris": [
    {
      "extensionUriAnchor": 1,
      "uri": "/functions_arithmetic.yaml"
    }
  ],
  "extensions": [
    {
      "extensionFunction": {
        "extensionUriReference": 1,
        "functionAnchor": 1,
        "name": "sum"
      }
    }
  ],

While if I generate the same plan using substrait-java (using SqlToSubstrait()), the plan contains this:

  "extensionUris": [
    {
      "extensionUriAnchor": 1,
      "uri": "/functions_arithmetic.yaml"
    }
  ],
  "extensions": [
    {
      "extensionFunction": {
        "extensionUriReference": 1,
        "name": "sum:fp32"
      }
    }
  ],

Note that in ibis the function is named sum while in substrait-java is named sum:fp32

If i print the extensions.aggregateFunctions() on the java side, I can see that all the functions include a data-type suffix:

[bool_and:bool, bool_or:bool, count:any, count:, any_value:any, approx_count_distinct:any, sum:dec, avg:dec, min:dec, max:dec, sum:i8, sum:i16, sum:i32, sum:i64, sum:fp32, sum:fp64, sum0:i8, sum0:i16, sum0:i32, sum0:i64, sum0:fp32, sum0:fp64, avg:i8, avg:i16, avg:i32, avg:i64, avg:fp32, avg:fp64, min:i8, min:i16, min:i32, min:i64, min:fp32, min:fp64, min:ts, min:tstz, max:i8, max:i16, max:i32, max:i64, max:fp32, max:fp64, max:ts, max:tstz, product:i8, product:i16, product:i32, product:i64, product:fp32, product:fp64, std_dev:fp32, std_dev:fp64, variance:fp32, variance:fp64, corr:fp32_fp32, corr:fp64_fp64, mode:i8, mode:i16, mode:i32, mode:i64, mode:fp32, mode:fp64, median:req_i8, median:req_i16, median:req_i32, median:req_i64, median:req_fp32, median:req_fp64, quantile:req_req_i64_any, string_agg:str_str]

I don't know If I'm missing something or this is actually an issue. I wonder if this different mapping of functions between ibis and substrait-java is the reason why it is unable to map the sum function to a substrait java Plan when I call the ProtoPlanConverter. Do you know if there is a way/workaround to bypass this?

Note that if in the python side I create a plan that does not include any scalar/aggregate function, the plan is correctly loaded into a java substrait Plan and then to Calcite.

jacques-n commented 1 year ago

Unless I missed something special we defined, that looks like a bug on the Ibis side. There are many sum impls in that file and the data type is required to determine which specific one should be used. I suggest opening a bug on Ibis to that effect.

JosepSampe commented 1 year ago

Thanks @jacques-n, I opened the issue in https://github.com/ibis-project/ibis-substrait/issues/703