Open RoyZhang2022 opened 5 months ago
Tried with MariaDB on Ubuntu. See same problem.
$ ./target/release/flight_sql_client --host localhost --port 52358 statement-query "SELECT 1;"
Error: read flight data
Caused by:
0: collect data stream
1: ProtocolError("Received RecordBatch prior to Schema")
Tried to update arrow to 50.0.0 and arrow-odbc to 8.0.0. Also updated proto/Flight.proto and proto/FlightSql.proto to newest version but still could not fix the problem.
I think bug is here:
Compare to arrow-rs:
/// Convert `RecordBatch`es to wire protocol `FlightData`s
pub fn batches_to_flight_data(
schema: &Schema,
batches: Vec<RecordBatch>,
) -> Result<Vec<FlightData>, ArrowError> {
let options = IpcWriteOptions::default();
let schema_flight_data: FlightData = SchemaAsIpc::new(schema, &options).into();
let mut dictionaries = vec![];
let mut flight_data = vec![];
let data_gen = writer::IpcDataGenerator::default();
let mut dictionary_tracker = writer::DictionaryTracker::new(false);
for batch in batches.iter() {
let (encoded_dictionaries, encoded_batch) =
data_gen.encoded_batch(batch, &mut dictionary_tracker, &options)?;
dictionaries.extend(encoded_dictionaries.into_iter().map(Into::into));
flight_data.push(encoded_batch.into());
}
let mut stream = vec![schema_flight_data];
stream.extend(dictionaries);
stream.extend(flight_data);
let flight_data: Vec<_> = stream.into_iter().collect();
Ok(flight_data)
}
Can see in arrow-rs version it explicitly encodes the schema and prepends that to the stream (vec) that is returned, but in the arrow-flightsql-odbc version this step seems to be missing.
According to the streaming IPC format, schema must come first before subsequent dictionaries/recordbatches:
<SCHEMA>
<DICTIONARY 0>
...
<DICTIONARY k - 1>
<RECORD BATCH 0>
...
<DICTIONARY x DELTA>
...
<DICTIONARY y DELTA>
...
<RECORD BATCH n - 1>
<EOS [optional]: 0xFFFFFFFF 0x00000000>
This is my first time seeing the codebase so let me know if I missed a place where schema was sent first in the stream :sweat_smile:
Hah, this code was written before it was documented that the schema must always be ahead of the data (If memory serves me well.) I am happy to accept any PR that resolves this issue (but I currently do not have time/motivation to fix this)
Thanks a lot @Jefffrey! I have identified a solution and got a fix! I will push a PR later.
Hi,
I am using arrow-flightsql-odbc to setup a flightsql service which could convert MSSQL result to arrow data. The MSSQL is setup without a problem. It could also connected with MSSQL ODBC Driver 18. But when I tested the service with flight_sql_client, I always got an error "Received RecordBatch prior to Schema". I am not sure what is wrong. Could you please help to indicate how to fix this problem? Thanks in advance! Please see details below.
The
flight_sql_client
is from arrow-flight.Best, Roy