timvw / arrow-flightsql-odbc

Apache License 2.0
12 stars 1 forks source link

Received RecordBatch prior to Schema for query #17

Open RoyZhang2022 opened 5 months ago

RoyZhang2022 commented 5 months ago

Hi,

I am using arrow-flightsql-odbc to setup a flightsql service which could convert MSSQL result to arrow data. The MSSQL is setup without a problem. It could also connected with MSSQL ODBC Driver 18. But when I tested the service with flight_sql_client, I always got an error "Received RecordBatch prior to Schema". I am not sure what is wrong. Could you please help to indicate how to fix this problem? Thanks in advance! Please see details below.

The flight_sql_client is from arrow-flight.

--ODBC connection string 
export ODBC_CONNECTION_STRING="Driver={ODBC Driver 18 for SQL Server};Server=tcp:mssql-im-adbc.database.windows.net,1433;Database=mssql-im-adbc;Uid=xxxxx;Pwd=xxxxxx;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;"

--server is started
$ cargo run server
   Compiling arrow-flightsql-odbc v0.2.0 (/home/azuser/arrow-flightsql-odbc)
    Finished dev [unoptimized + debuginfo] target(s) in 7.60s
     Running `target/debug/flightsql-odbc-server server`
[2024-02-01T01:20:58Z INFO  flightsql_odbc_server] odbc_connection_string: Driver={ODBC Driver 18 for SQL Server};Server=tcp:mssql-im-adbc.database.windows.net,1433;Database=mssql-im-adbc;Uid=xxxxx;Pwd=xxxxxx;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;
[2024-02-01T01:20:58Z INFO  flightsql_odbc_server] binding to: 0.0.0.0:52358
--flight_sql_client command
$ ./target/release/flight_sql_client --host localhost --port 52358 statement-query "SELECT 1;"
Error: read flight data

Caused by:
    0: collect data stream
    1: ProtocolError("Received RecordBatch prior to Schema")
--output on server side
[2024-02-01T01:23:08Z INFO  arrow_flightsql_odbc::myserver] handling cmd: GetCommandSchema(GetCommandSchemaRequest { command: StatementQuery(CommandStatementQuery { query: "SELECT 1;" }), response_sender: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) } })
[2024-02-01T01:23:08Z WARN  odbc_api::handles::logging] State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Changed database context to 'mssql-im-adbc'.
[2024-02-01T01:23:08Z WARN  odbc_api::handles::logging] State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Changed language setting to us_english.
[2024-02-01T01:23:08Z INFO  arrow_flightsql_odbc::myserver] handling cmd: GetCommandData(GetCommandDataRequest { command: StatementQuery(CommandStatementQuery { query: "SELECT 1;" }), response_sender: Sender { chan: Tx { inner: Chan { tx: Tx { block_tail: 0x7fc27c00a300, tail_position: 0 }, semaphore: (Semaphore { permits: 100 }, 100), rx_waker: AtomicWaker, tx_count: 1, rx_fields: "..." } } } })
[2024-02-01T01:23:08Z WARN  odbc_api::handles::logging] State: 01000, Native error: 5701, Message: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Changed database context to 'mssql-im-adbc'.
[2024-02-01T01:23:08Z WARN  odbc_api::handles::logging] State: 01000, Native error: 5703, Message: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Changed language setting to us_english.

Best, Roy

RoyZhang2022 commented 5 months ago

Tried with MariaDB on Ubuntu. See same problem.

$ ./target/release/flight_sql_client --host localhost --port 52358 statement-query "SELECT 1;"
Error: read flight data

Caused by:
    0: collect data stream
    1: ProtocolError("Received RecordBatch prior to Schema")
RoyZhang2022 commented 5 months ago

Tried to update arrow to 50.0.0 and arrow-odbc to 8.0.0. Also updated proto/Flight.proto and proto/FlightSql.proto to newest version but still could not fix the problem.

Jefffrey commented 4 months ago

I think bug is here:

https://github.com/timvw/arrow-flightsql-odbc/blob/d4ab6598d0d3cef984fbbbc45f34138554fe9d0f/src/odbc_command_handler.rs#L163-L180

Compare to arrow-rs:

/// Convert `RecordBatch`es to wire protocol `FlightData`s
pub fn batches_to_flight_data(
    schema: &Schema,
    batches: Vec<RecordBatch>,
) -> Result<Vec<FlightData>, ArrowError> {
    let options = IpcWriteOptions::default();
    let schema_flight_data: FlightData = SchemaAsIpc::new(schema, &options).into();
    let mut dictionaries = vec![];
    let mut flight_data = vec![];

    let data_gen = writer::IpcDataGenerator::default();
    let mut dictionary_tracker = writer::DictionaryTracker::new(false);

    for batch in batches.iter() {
        let (encoded_dictionaries, encoded_batch) =
            data_gen.encoded_batch(batch, &mut dictionary_tracker, &options)?;

        dictionaries.extend(encoded_dictionaries.into_iter().map(Into::into));
        flight_data.push(encoded_batch.into());
    }
    let mut stream = vec![schema_flight_data];
    stream.extend(dictionaries);
    stream.extend(flight_data);
    let flight_data: Vec<_> = stream.into_iter().collect();
    Ok(flight_data)
}

Can see in arrow-rs version it explicitly encodes the schema and prepends that to the stream (vec) that is returned, but in the arrow-flightsql-odbc version this step seems to be missing.

According to the streaming IPC format, schema must come first before subsequent dictionaries/recordbatches:

<SCHEMA>
<DICTIONARY 0>
...
<DICTIONARY k - 1>
<RECORD BATCH 0>
...
<DICTIONARY x DELTA>
...
<DICTIONARY y DELTA>
...
<RECORD BATCH n - 1>
<EOS [optional]: 0xFFFFFFFF 0x00000000>

This is my first time seeing the codebase so let me know if I missed a place where schema was sent first in the stream :sweat_smile:

timvw commented 4 months ago

Hah, this code was written before it was documented that the schema must always be ahead of the data (If memory serves me well.) I am happy to accept any PR that resolves this issue (but I currently do not have time/motivation to fix this)

RoyZhang2022 commented 4 months ago

Thanks a lot @Jefffrey! I have identified a solution and got a fix! I will push a PR later.