spiceai / spiceai

A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.
https://docs.spiceai.org
Apache License 2.0
1.86k stars 73 forks source link

DuckDB acceleration panic for some tables #2946

Open mjzk opened 3 days ago

mjzk commented 3 days ago

Describe the bug

When changing my nyc taxi dataset from Arrow acceleration to DuckDB acceleration, spice run crashes for panic from duckdb-rs codes.

But for the tiny table with simple schema, duckdb acceleration works. It is reasonable to check duckdb-rs firstly (in fact, earlier versions of duckdb-rs had quite lots of problems).

To Reproduce

Steps to reproduce the behavior:

  1. changing my nyc taxi dataset from Arrow acceleration to DuckDB acceleration
  2. run command spice run
  3. Rust program panicked.

Expected behavior

spice run crashes, a.k.a., the spiced can not been started. This is a serious end-user failure, and should be avoided in all case.

Runtime Details

spice run                                                                                                        1 ↵
2024/10/03 19:34:26 INFO Checking for latest Spice runtime release...
2024/10/03 19:34:26 INFO Spice.ai runtime starting...
2024-10-03T11:34:26.918652Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-10-03T11:34:26.918647Z  INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
2024-10-03T11:34:26.918817Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2024-10-03T11:34:26.918953Z  INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2024-10-03T11:34:26.918981Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
thread 'main' panicked at /home/jin/.cargo/git/checkouts/duckdb-rs-72597b63560c462f/5b98603/crates/duckdb/src/vtab/arrow.rs:229:5:
assertion `left == right` failed
  left: 18
 right: 14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at library/core/src/panicking.rs:221:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x64857c9c2ced - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h4ba39a43d3e87adf
   1:     0x64857c9f4dcb - core::fmt::write::h948fa78229aa764e
   2:     0x64857c9bc803 - std::io::Write::write_fmt::h60b4ed4b4cde96a3
   3:     0x64857c9c3e32 - std::panicking::default_hook::{{closure}}::h326e80fe334527f5
   4:     0x64857c9c3a9e - std::panicking::default_hook::h154dcec67b5cc2b3
   5:     0x64857c9c472f - std::panicking::rust_panic_with_hook::haa0d1f76c5d6487e
   6:     0x64857c9c43e3 - std::panicking::begin_panic_handler::{{closure}}::h8e01df9f228549a9
   7:     0x64857c9c31d9 - std::sys::backtrace::__rust_end_short_backtrace::h173dbc6039c4534b
   8:     0x64857c9c40a4 - rust_begin_unwind
   9:     0x64857c9f1dc5 - core::panicking::panic_nounwind_fmt::h0f494446fa6955b5
  10:     0x64857c9f1e52 - core::panicking::panic_nounwind::h8c281e8e006df584
  11:     0x64857c9f1f96 - core::panicking::panic_cannot_unwind::h4b39c2cc0e9904e6
  12:     0x648576b1271f - duckdb::vtab::func::hff48a4ff98eb4817
  13:     0x648577020e52 - _ZN6duckdb14CTableFunctionERNS_13ClientContextERNS_18TableFunctionInputERNS_9DataChunkE
  14:     0x64857820ef5f - _ZNK6duckdb17PhysicalTableScan7GetDataERNS_16ExecutionContextERNS_9DataChunkERNS_19OperatorSourceInputE
  15:     0x6485777b3d19 - _ZN6duckdb16PipelineExecutor15FetchFromSourceERNS_9DataChunkE
  16:     0x6485777c62f0 - _ZN6duckdb16PipelineExecutor7ExecuteEm
  17:     0x6485777c67da - _ZN6duckdb12PipelineTask11ExecuteTaskENS_17TaskExecutionModeE
  18:     0x6485777b6669 - _ZN6duckdb12ExecutorTask7ExecuteENS_17TaskExecutionModeE
  19:     0x6485777c45bc - _ZN6duckdb8Executor11ExecuteTaskEb
  20:     0x648578369d5d - _ZN6duckdb13ClientContext19ExecuteTaskInternalERNS_17ClientContextLockERNS_15BaseQueryResultEb
  21:     0x64857836a32c - _ZN6duckdb18PendingQueryResult15ExecuteInternalERNS_17ClientContextLockE
  22:     0x64857836a5c4 - _ZN6duckdb18PendingQueryResult7ExecuteEv
  23:     0x64857836d3e8 - _ZN6duckdb17PreparedStatement7ExecuteERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_18BoundParameterDataENS_33CaseInsensitiveStringHashFunctionENS_29CaseInsensitiveStringEqualityESaISt4pairIKS7_S8_EEEb
  24:     0x64857702b364 - duckdb_execute_prepared_arrow
  25:     0x64857700b7a5 - duckdb::raw_statement::RawStatement::execute::hc64d146d17cd377d
  26:     0x648576b12359 - duckdb::Connection::execute::hca540cacc4e0880c
  27:     0x648576b1f83f - datafusion_table_providers::duckdb::creator::TableCreator::create_with_tx::hf184181debcbf51e
  28:     0x648576b21a32 - datafusion_table_providers::duckdb::creator::TableCreator::create::h062232efc2c5cd4d
  29:     0x648576c28795 - <datafusion_table_providers::duckdb::DuckDBTableProviderFactory as datafusion_catalog::table::TableProviderFactory>::create::{{closure}}::h6086211a6d31cc26
  30:     0x64857506b53a - <runtime::dataaccelerator::duckdb::DuckDBAccelerator as runtime::dataaccelerator::DataAccelerator>::create_external_table::{{closure}}::h5719360e82e249bd
  31:     0x64857476c896 - runtime::datafusion::DataFusion::create_accelerated_table::{{closure}}::h2e43158348989e84
  32:     0x648574767ad9 - runtime::datafusion::DataFusion::register_table::{{closure}}::h917fc1fba6baa564
  33:     0x64857477ebdd - runtime::Runtime::register_loaded_dataset::{{closure}}::haf43ce0db78f639e
  34:     0x64857478dec4 - <backoff::future::Retry<S,B,N,Fn,Fut> as core::future::future::Future>::poll::h63df3093901a4f61
  35:     0x648574490e1a - runtime::Runtime::load_dataset::{{closure}}::hf53a1c8fd19f005e
  36:     0x648574424cf2 - <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll::h5e7c8e5c13a21389
  37:     0x648574391d9b - runtime::Runtime::load_datasets::{{closure}}::hbc0a0a2be7d2273b
  38:     0x648574424fdc - <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll::hab01ea9ae7a197b3
  39:     0x6485743b91e3 - <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll::h32bebbc9129e9431
  40:     0x6485746b31f3 - spiced::run::{{closure}}::he33ab0b290583a01
  41:     0x6485746a9c91 - tokio::runtime::park::CachedParkThread::block_on::hdfb407622002c555
  42:     0x6485744d7268 - tokio::runtime::context::runtime::enter_runtime::h5f97062683ce4313
  43:     0x64857483b379 - tokio::runtime::runtime::Runtime::block_on::h5282a6ea2bda9868
  44:     0x6485745e642f - spiced::main::hb7cbd6cbd971f490
  45:     0x64857474cee3 - std::sys::backtrace::__rust_begin_short_backtrace::h5aa962c3e8cdb166
  46:     0x6485746262d9 - std::rt::lang_start::{{closure}}::h5c7a26beda4fdb43
  47:     0x64857c9b26a0 - std::rt::lang_start_internal::he8065945e6c9e6c2
  48:     0x6485745e6735 - main
  49:     0x75e70a834e08 - <unknown>
  50:     0x75e70a834ecc - __libc_start_main
  51:     0x6485743074a5 - _start
  52:                0x0 - <unknown>
thread caused non-unwinding panic. aborting.
2024/10/03 19:34:27 ERROR error running Spice.ai error="os: process already finished"

Spicepod

spicepod.yml section:

version: v1beta1
kind: Spicepod
name: spice_app
datasets:
  - from: postgres:rides2
    name: rides2
    params:
      pg_host: 127.0.0.1
      pg_db: jin
      pg_port: 5432
      pg_user: jin
      pg_sslmode: disable
    acceleration:
      engine: duckdb
      mode: file
      params:
        duckdb_file: /data/n1/spice/ws/spice_app/data_accl/rides2.db

Output of the describe table

+-----------------------+-----------------------------+-------------+
| column_name           | data_type                   | is_nullable |
+-----------------------+-----------------------------+-------------+
| vendor_id             | Utf8                        | YES         |
| pickup_datetime       | Timestamp(Nanosecond, None) | YES         |
| dropoff_datetime      | Timestamp(Nanosecond, None) | YES         |
| passenger_count       | Decimal128(38, 0)           | YES         |
| trip_distance         | Decimal128(38, 2)           | YES         |
| pickup_longitude      | Decimal128(38, 15)          | YES         |
| pickup_latitude       | Decimal128(38, 15)          | YES         |
| rate_code             | Int32                       | YES         |
| dropoff_longitude     | Decimal128(38, 15)          | YES         |
| dropoff_latitude      | Decimal128(38, 15)          | YES         |
| payment_type          | Int32                       | YES         |
| fare_amount           | Decimal128(38, 1)           | YES         |
| extra                 | Decimal128(38, 0)           | YES         |
| mta_tax               | Decimal128(38, 1)           | YES         |
| tip_amount            | Decimal128(38, 0)           | YES         |
| tolls_amount          | Decimal128(38, 0)           | YES         |
| improvement_surcharge | Decimal128(38, 1)           | YES         |
| total_amount          | Decimal128(38, 1)           | YES         |
+-----------------------+-----------------------------+-------------+

Output of explain query

the spiced can not been started.

spice, spiced, OS info

Have you tried this on the latest trunk branch?

If possible, run spiced with DEBUG log level n/a

Screenshots

Image

Additional context

n/a

Sevenannn commented 1 day ago

Hi @mjzk, thanks for contributing and raising the issue!

This bug is caused by a lack of Postgres Native Schema Inference in the Datafusion Table Provider, where Postgres schema is inferred based on first line of data. Any missing Postgres numeric / timestamptz data in the first row will cause fields missing in the accelerator schema, in this case DuckDB, and therefore cause panic when data is inserted into DuckDB.

We’ve created an issue to track the Postgres Native Schema inference implementation, and we will fix it in the recent releases.

mjzk commented 1 day ago

OK. Thanks for clarification. It is interesting to see that we now inference Postgres schema based on first line of data. There are already several ways to get the schema of Postgres in Rust. And happy to see we will fix this problem soon.