jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.07k stars 221 forks source link

MutableDictionaryArray - another rewrite needed #1562

Closed aldanor closed 10 months ago

aldanor commented 10 months ago

@ritchie46 I think we've broken the mutable dictionary array again 😄 (I've tested it a bit and it seems to not behave properly in some cases, inserting extra values into the value map instead of reusing old ones).

https://docs.rs/hashbrown/latest/hashbrown/struct.HashMap.html#method.raw_entry_mut

In particular, the hash used to initialized the raw entry must still be consistent with the hash of the key that is ultimately stored in the entry. This is because implementations of HashMap may need to recompute hashes when resizing, at which point only the keys are available.

"Implementations of HashMap may need to recompute hashes when resizing"...

To confirm:

#[test]
fn test_big_dict() {
    let n = 10;
    let s = (0..10).map(|i| i.to_string()).collect::<Vec<_>>();
    let mut arr = MutableDictionaryArray::<u8, MutableUtf8Array<i32>>::new();
    for i in 0..n {
        arr.try_push(Some(&s[i])).unwrap();
    }
    assert_eq!(arr.values().len(), n);
    for j in 0..10_000 {
        for i in 0..n {
            arr.try_push(Some(&s[i])).unwrap();
        }
    }
    assert_eq!(arr.values().len(), n);
}
thread 'array::dictionary::mutable::test_big_dict' panicked at 'assertion failed: `(left == right)`
  left: `24`,
 right: `10`', tests/it/array/dictionary/mutable.rs:168:5
aldanor commented 10 months ago

How do we do this without resorting to evil hacks and pointers?... 🤔

The only thing that comes to mind is:

Not entirely sure it will work, but can give it a spin.