Open pearu opened 4 years ago
This issue is related to C++ UDF/UDTFs and not to RBC UDF/UDTFs. The task is about making sure that those C++ UDF/UDTFs that can be executed only on CPU will not force all UDF/UDTFs to be executed on CPU because all C++ UDF/UDTFs will end up in the CPU specific LLVM module (?).
Reproducer:
Define a runtime UDF test function:
def test_simple_udf(omnisci):
@omnisci('int32(int32)')
def simple_udf(x):
return x + 1
query = 'select simple_udf(123)'
descr, result = omnisci.sql_execute(query)
result = list(result)
#include <cstdint>
#define EXTENSION_NOINLINE extern "C" NEVER_INLINE DEVICE
EXTENSION_NOINLINE int32_t udf_diff(const int32_t x, const int32_t y) { return x - y; }
3. Start server and run runtime UDF test:
$ bin/omnisci_server --enable-runtime-udf --enable-table-functions compileWorkUnit#2398: udf_cpu_module null compileWorkUnit#2399: udf_gpu_module null compileWorkUnit#2400: rt_udf_cpu_module defines: simple_udfcpu_0, compileWorkUnit#2401: rt_udf_gpu_module defines: simple_udf__gpu_0, generateNativeGPUCode#974: module defines: multifrag_query_hoisted_literals, simple_udfgpu_0, query_group_by_template, agg_id_shared, record_error_code, get_scan_output_slot, row_func_hoisted_literals, filter_func_hoisted_literals,
which indicates that `simple_udf` gpu implementation is used.
4. Start server with loadtime UDF and run runtime UDF test:
$ bin/omnisci_server --enable-runtime-udf --enable-table-functions --udf sample_udf.cpp compileUdf#413: udf_filename="sample_udf.cpp" error: cannot find libdevice for sm_75. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice. compileWorkUnit#2398: udf_cpu_module defines: udf_diff, compileWorkUnit#2399: udf_gpu_module defines: udf_diff, compileWorkUnit#2400: rt_udf_cpu_module defines: simple_udfcpu_0, compileWorkUnit#2401: rt_udf_gpu_module defines: simple_udf__gpu_0, compileWorkUnit#2398: udf_cpu_module defines: udf_diff, compileWorkUnit#2399: udf_gpu_module defines: udf_diff, compileWorkUnit#2400: rt_udf_cpu_module defines: simple_udfcpu_0, compileWorkUnit#2401: rt_udf_gpu_module defines: simple_udf__gpu_0, generateNativeCPUCode#364: module defines: agg_id, record_error_code, get_scan_output_slot, multifrag_query_hoisted_literals, simple_udf__cpu_0, query_group_by_template, row_func_hoisted_literals, filter_func_hoisted_literals,
which indicates that `simple_udf` cpu implementation is forced.
The "cannot find libdevice" error is explained in https://stackoverflow.com/questions/59826961/fail-to-link-cuda-example-with-clang-9-under-ubuntu-18-04. Solution: use clang 11 (I was using clang 9).
This issue requires a test of the SQL HAVING
clause as an example that triggers multiple steps of query executions.
Actually, any composite test would trigger multiple steps of query executions. For instance,
select bar(out0) from table(foo(cursor(select x from mytable)))
that involves three execution steps:
select x from mytable
select ... from table(foo(cursor(...)))
select bar(out0) from ...
and the aim is to ensure that steps 1 and 3 are executed on GPU when 2 is running on CPU.
Lazily link udfs/udtfs so that we don’t force all kernels to CPU if one UDTF or UDF cannot run on GPU