xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.1k stars 67 forks source link

BUG: Column pruning failed #688

Closed qinxuye closed 11 months ago

qinxuye commented 11 months ago

Describe the bug

I encounter this issue in https://github.com/xorbitsai/xorbits_sql/pull/1, when I disabled column pruning optimization, I can pass the test.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.
prepare_data = (<duckdb.DuckDBPyConnection object at 0x7ff53192e1b0>, [("select\n        l_returnflag,\n        l_linestatus,\n      ...E) < CAST(\'1995-01-01\' AS DATE)\n  AND CAST("lineitem"."l_shipdate" AS DATE) >= CAST(\'1994-01-01\' AS DATE)'), ...])

    def test_execute_tpc_h(prepare_data):
        conn, sqls = prepare_data
        for sql, _ in sqls[:1]:
            expected = conn.execute(sql).fetchdf()
>           result = execute(
                parse_one(sql, dialect="duckdb").transform(_to_csv).sql(pretty=True),
                TPCH_SCHEMA,
                dialect="duckdb",
            ).fetch()

xorbits_sql/tests/test_tpc_h.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xorbits_sql/core.py:112: in execute
    result = XorbitsExecutor(tables=tables_, schema=schema).execute(plan)
xorbits_sql/executor.py:226: in execute
    xorbits.run(df)
../xorbits/python/xorbits/core/execution.py:55: in run
    mars_execute(mars_tileables, **kwargs)
../xorbits/python/xorbits/_mars/deploy/oscar/session.py:1760: in execute
    return session.execute(
../xorbits/python/xorbits/_mars/deploy/oscar/session.py:1576: in execute
    execution_info: ExecutionInfo = fut.result(
../../miniconda3/lib/python3.10/concurrent/futures/_base.py:458: in result
    return self.__get_result()
../../miniconda3/lib/python3.10/concurrent/futures/_base.py:403: in __get_result
    raise self._exception
../xorbits/python/xorbits/_mars/deploy/oscar/session.py:1740: in _execute
    await execution_info
../xorbits/python/xorbits/_mars/deploy/oscar/session.py:124: in wait
    return await self._aio_task
../xorbits/python/xorbits/_mars/deploy/oscar/session.py:873: in _run_in_background
    raise task_result.error.with_traceback(task_result.traceback)
../xorbits/python/xorbits/_mars/services/task/supervisor/processor.py:378: in run
    await asyncio.to_thread(self._preprocessor.optimize)
../../miniconda3/lib/python3.10/asyncio/threads.py:25: in to_thread
    return await loop.run_in_executor(None, func_call)
../../miniconda3/lib/python3.10/concurrent/futures/thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
../xorbits/python/xorbits/_mars/services/task/supervisor/preprocessor.py:155: in optimize
    self.tileable_optimization_records = optimize_tileable_graph(
../xorbits/python/xorbits/_mars/optimization/logical/tileable/core.py:50: in optimize
    return TileableOptimizer.optimize(tileable_graph)
../xorbits/python/xorbits/_mars/core/mode.py:78: in _inner
    return func(*args, **kwargs)
../xorbits/python/xorbits/_mars/optimization/logical/core.py:277: in optimize
    if rule.apply():
../xorbits/python/xorbits/_mars/optimization/logical/tileable/column_pruning/column_pruning_rule.py:228: in apply
    self._build_context()
../xorbits/python/xorbits/_mars/optimization/logical/tileable/column_pruning/column_pruning_rule.py:113: in _build_context
    data, self._get_successor_required_columns(data)
../xorbits/python/xorbits/_mars/optimization/logical/tileable/column_pruning/column_pruning_rule.py:59: in _get_successor_required_columns
    *[self._context[successor][data] for successor in successors]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <list_iterator object at 0x7ff5114364d0>

>       *[self._context[successor][data] for successor in successors]
    )
E   KeyError: Series(op=DataFrameIndex)

../xorbits/python/xorbits/_mars/optimization/logical/tileable/column_pruning/column_pruning_rule.py:59: KeyError