ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.07k stars 586 forks source link

Udf test fail due to bad test environment #873

Closed maxmzkr closed 8 years ago

maxmzkr commented 8 years ago

I'm trying to get my test environment setup and I've gotten everything to work other than the udfs.

My tests fail with this

__________________________________________________________________________________________________________________ TestUDFE2E.test_decimal ___________________________________________________________________________________________________________________

self = <ibis.impala.tests.test_udf.TestUDFE2E testMethod=test_decimal>

    @pytest.mark.udf
    def test_decimal(self):
        col = self.con.table('tpch_customer').c_acctbal
        literal = ibis.literal(1).cast('decimal(12,2)')
        name = '__tmp_udf_' + util.guid()
        func = self._udf_creation_to_op(name, 'Identity',
                                        ['decimal(12,2)'],
>                                       'decimal(12,2)')

ibis/impala/tests/test_udf.py:204: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/impala/tests/test_udf.py:325: in _udf_creation_to_op
    self.con.create_function(func, database=self.test_data_db)
ibis/impala/client.py:1212: in create_function
    self._execute(stmt)
ibis/client.py:154: in _execute
    cur = self.con.execute(query)
ibis/impala/client.py:119: in execute
    cursor.execute(query, async=async)
ibis/impala/client.py:227: in execute
    self._cursor.execute_async(stmt)
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:289: in execute_async
    self._execute_async(op)
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:308: in _execute_async
    operation_fn()
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:286: in op
    async=True)
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:919: in execute
    return self._operation('ExecuteStatement', req)
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:849: in _operation
    resp = self._rpc(kind, request)
local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:817: in _rpc
    err_if_rpc_not_ok(response)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

resp = TExecuteStatementResp(status=TStatus(errorCode=None, errorMessage='AnalysisExc...n\n', sqlState='HY000', infoMessages=None, statusCode=3), operationHandle=None)

    def err_if_rpc_not_ok(resp):
        if (resp.status.statusCode != TStatusCode.SUCCESS_STATUS and
                resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
                resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
>           raise HiveServer2Error(resp.status.errorMessage)
E           HiveServer2Error: AnalysisException: Could not load binary: /__ibis/ibis-testing-data/udf/udf-sample.ll
E           Could not parse module /tmp/udf-sample.8923.1.ll: Global not a pointer type!

local/lib/python2.7/site-packages/impyla-0.13.8-py2.7.egg/impala/hiveserver2.py:604: HiveServer2Error
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================================================================================== 1 failed, 108 passed, 720 skipped in 181.47 seconds =====================================================================================================
Exception AttributeError: "'NoneType' object has no attribute 'IbisError'" in <bound method ImpalaTemporaryTable.__del__ of ImpalaTemporaryTable('__ibis_tmp_ee9084c5718e4310be9b229dc3e80d02.`delimited_table_test1`', ibis.Schema {  
  foo  string
  bar  double
  baz  int8
}, <ibis.impala.client.ImpalaClient object at 0x7fa04c8a83d0>)> ignored

It looks like my udfs are being compiled incorrectly. I'm not sure how to go about debugging this. Do you think this is a version issue with llvm/clang? Do I have my impala environment setup incorrectly? Or am I just doing something else completely wrong?

wesm commented 8 years ago

You have to have the same LLVM version on your path that was used to compile Impala. I have code in my bash profile that looks like this:

export IBIS_TEST_HDFS_SUPERUSER=wesm
export IBIS_TEST_LLVM_CONFIG=$NATIVE_TOOLCHAIN/llvm-3.3-p1/bin/llvm-config
export IBIS_POSTGRES_USER=REDACTED
export IBIS_POSTGRES_PASS=REDACTED
export IBIS_WEBHDFS_PORT=5070

function ibis_toolchain {
    export PATH="$($IBIS_TEST_LLVM_CONFIG --bindir):$PATH"
    export IMPALA_GCC_VERSION=4.9.2
    echo $PATH
}

Impala versions prior to 2.6.0 are using LLVM 3.3 (see IMPALA-775 where the project upgraded to 3.8). You can take advantage of https://github.com/cloudera/native-toolchain if you need to build it from source.

wesm commented 8 years ago

It would be nice to start a developer documentation Markdown document; so far the only docs are for users.

In the interim I would also recommend adding --skip-udf to your py.test call to skip over these tests

maxmzkr commented 8 years ago

Thanks! That should help!