substrait-io / substrait-java

Apache License 2.0
72 stars 70 forks source link

feat: support UDT literals using bytestring/binary representation in calcite #232

Closed cheikhachraf closed 4 months ago

cheikhachraf commented 4 months ago

calcite does not accept RelDatatypes with sqltypename.OTHER and has no way of representing it as of now. It will reject it at different steps such as RexLiteral Precondition type check or when building using the RexBuilder.

There's a ticket for this on Calcite's board but nothing as of of now.

one hacky solution would be:

  1. convert the substrait Expression.UDTLiteral into a VARBINARY/ByteString type. (the issue with this is that the Userdatatype.getsqlName would have to be set to sqlname.varbinary, otherwise the type check in calcite will fail)
  2. and then when converting back, just check if the RelDatatype inherits from our UserDataType and use that to do the conversion correctly.
CLAassistant commented 4 months ago

CLA assistant check
All committers have signed the CLA.

vbarua commented 4 months ago

Thanks for working on this!

One thing I think would be helpful would be to add a full roundtrip test to verify that Substrait inputs with user-defined type literals can go from PROTO -> POJO -> CALCITE and back fully.

There's an example for testing user-defined types in CustomFunctionTest, it could make sense to add a test to this class (maybe with a rename) or make a new class.

There is code for something like this in assertFullRoundTrip(Rel pojo1), however that assumes that only standard extensions and functions are loaded (for now).

vbarua commented 4 months ago

We've had enough activity this week for there to have been merge conflicts 😅 I rebased this PR off of the current main branch to deal with this.