A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
BSD 2-Clause "Simplified" License
291
stars
72
forks
source link
Enable TestBinaryDuplicateFunction and TestBinaryObjectSizeFunction for Trino in the module of transportable-udfs-example-udfs #131
With the upgrade to Trino v406, these two test classes are temporarily disabled for Trino test by the following reason:
As the test infrastructure from Trino named QueryAssertions is used to run these test for Trino, QueryAssertions mandatory execute the function with the query in two formats: one with is the normal query (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0); and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0);), the other is with "where RAND()>0" clause (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0) where RAND()>0; and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0) where RAND()>0;) QueryAssertions verifies the output from both queries are equal otherwise the test fail. However, the execution of the query with where clause triggers the code of VariableWidthBlockBuilder.writeByte() to create the input byte array in Slice with an initial 32 byes capacity, while the execution of the query without where clause does not trigger the code of VariableWidthBlockBuilder.writeByte() and create the input byte array in Slice with the actual capacity of the content. Therefore, the outputs from both queries are different.
As the code causing the problem lie in Trino part, these tests should be enabled after the fix is done in Trino.
With the upgrade to Trino v406, these two test classes are temporarily disabled for Trino test by the following reason:
As the test infrastructure from Trino named QueryAssertions is used to run these test for Trino, QueryAssertions mandatory execute the function with the query in two formats: one with is the normal query (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0); and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0);), the other is with "where RAND()>0" clause (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0) where RAND()>0; and SELECT "binary_size"(a0) FROM (VALUES ROW(from_base64('Zm9v'))) t(a0) where RAND()>0;) QueryAssertions verifies the output from both queries are equal otherwise the test fail. However, the execution of the query with where clause triggers the code of VariableWidthBlockBuilder.writeByte() to create the input byte array in Slice with an initial 32 byes capacity, while the execution of the query without where clause does not trigger the code of VariableWidthBlockBuilder.writeByte() and create the input byte array in Slice with the actual capacity of the content. Therefore, the outputs from both queries are different.
As the code causing the problem lie in Trino part, these tests should be enabled after the fix is done in Trino.