suharev7 / clickhouse-rs

Asynchronous ClickHouse client library for Rust programming language.
MIT License
324 stars 121 forks source link

Batch insert with different numbers of colummns in row fails with OutOfRange error #129

Open RicoGit opened 3 years ago

RicoGit commented 3 years ago

I've tried to insert a batch of rows into Db and reached OutOfRange error. Looks like a problem in the case when the first row contains fewer columns than the next ones. I found this piece of code, why do we need to check this: if block.row_count() <= 1?

fn put_param<K: ColumnType>(
    key: Cow<'static, str>,
    value: Value,
    block: &mut Block<K>,
) -> Result<()> {
    let col_index = match key.as_ref().get_index(&block.columns) {
        Ok(col_index) => col_index,
        Err(Error::FromSql(FromSqlError::OutOfRange)) => {
            if block.row_count() <= 1 {  // why?
               ...
            } else {
                return Err(Error::FromSql(FromSqlError::OutOfRange));
            }
        }
        Err(err) => return Err(err),
    };

    block.columns[col_index].push(value);
    Ok(())
}

For example, I have a table with 2 optional fields with default values (a and b). I'm tried to insert next batch:

(a, "foo")
(a, "foo2")
(a, "foo3", b:"bar")

This input data can't be inserted, because after adding row number 2 into a block, the block will have size 1 and it'll fail the check: if block.row_count() <= 1.

suharev7 commented 3 years ago

The current implementation assumes that all rows have the same number and types of columns (but for optional fields, it is possible to implement adding columns when they are not in the first row). This check is needed because the first row creates all the columns in the block.

RicoGit commented 3 years ago

This check is needed because the first row creates all the columns in the block.

Why? What would be broken if we allow creating columns for any row in the block? Why <= 1, not a == 0?