linkedin / transport

A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
BSD 2-Clause "Simplified" License
291 stars 72 forks source link

Enable TestBinaryDuplicateFunction and TestBinaryObjectSizeFunction for Trino in the module of transportable-udfs-example-udfs #131

Open yiqiangin opened 1 year ago

yiqiangin commented 1 year ago

With the upgrade to Trino v406, these two test classes are temporarily disabled for Trino test by the following reason:

As the test infrastructure from Trino named QueryAssertions is used to run these test for Trino, QueryAssertions mandatory execute the function with the query in two formats: one with is the normal query (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0); and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0);), the other is with "where RAND()>0" clause (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0) where RAND()>0; and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0) where RAND()>0;) QueryAssertions verifies the output from both queries are equal otherwise the test fail. However, the execution of the query with where clause triggers the code of VariableWidthBlockBuilder.writeByte() to create the input byte array in Slice with an initial 32 byes capacity, while the execution of the query without where clause does not trigger the code of VariableWidthBlockBuilder.writeByte() and create the input byte array in Slice with the actual capacity of the content. Therefore, the outputs from both queries are different.

As the code causing the problem lie in Trino part, these tests should be enabled after the fix is done in Trino.

xkrogen commented 1 year ago

cc @akshayrai any thoughts on this?