spiraldb / vortex

A toolkit for working with compressed Arrow in-memory, on-disk, and over-the-wire. "The LLVM of file formats"
Apache License 2.0
193 stars 8 forks source link

Layout readers do not handle projections with repeated columns #712

Open danking opened 2 weeks ago

danking commented 2 weeks ago

I expected this test to pass

#[tokio::test]
async fn test_repeated_projection() {
    let strings = ChunkedArray::from_iter([
        VarBinArray::from(vec!["ab", "foo", "bar", "baz"]).into_array(),
        VarBinArray::from(vec!["ab", "foo", "bar", "baz"]).into_array(),
    ])
    .into_array();

    let st = StructArray::from_fields(&[("strings", strings)]);
    let buf = Vec::new();
    let mut writer = LayoutWriter::new(buf);
    writer = writer.write_array_columns(st.into_array()).await.unwrap();
    let written = writer.finalize().await.unwrap();

    let mut stream = LayoutReaderBuilder::new(written, LayoutDeserializer::default())
        .with_projection(Projection::new([0, 0]))
        .with_batch_size(5)
        .build()
        .await
        .unwrap();

    stream.next().await.unwrap().unwrap();
}

but instead I get this error. Line 69 is the last line of the test.

---- layouts::tests::test_repeated_projection stdout ----
thread 'layouts::tests::test_repeated_projection' panicked at vortex-serde/src/layouts/tests.rs:69:34:
called `Result::unwrap()` on an `Err` value: Wrong state transition, message [0, 1] (with range [0, 576)) should have been fetched
Backtrace:
disabled backtrace
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
AdamGS commented 2 weeks ago

I think we should just dedup/fail gracefully right?

robert3005 commented 2 weeks ago

This should work imho. Right now the reader is dumb and just assumes unique set of columns is being read

robert3005 commented 2 weeks ago

I think this is the wrong level to make it work at. This should be unique set of columns to read. You can duplicate column at higher level