quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.02k stars 670 forks source link

New Fastfield Trait with Nullhandling #1678

Closed PSeitz closed 1 year ago

PSeitz commented 1 year ago

OptionalColumn should be the default, so all consumers would need to handle Option<T>, or try to convert to full column

pub trait OptionalColumn<T: PartialOrd = u64>: Send + Sync {

    fn get_val(&self, idx: u64) -> Option<T>;

    // TODO: Should output be `Option<T>` or `T`
    fn get_range(&self, start: u64, output: &mut [T]); 

    /// Return the positions of values which are in the provided range.
        fn get_positions_for_value_range(
        &self,
        value_range: RangeInclusive<T>,
        doc_id_range: Range<u32>,
        positions: &mut Vec<u32>,
    );

    fn min_value(&self) -> Option<T>;
    fn max_value(&self) -> Option<T>;

    fn num_vals(&self) -> u32;

    /// Returns a iterator over the data
    fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = Option<T>> + 'a>; 

    /// return full column with default_val if value is None
    fn or(&self, default_val: T) -> Arc<dyn Column<T>>;

    /// return full column if all values are set and is not empty
    fn into_full(&self) -> Option<Arc<dyn Column<T>>>;

    /// maybe docset, if we can utilize it
    fn docset_with_value(&self) -> Box<dyn DocSet>;
}
fulmicoton commented 1 year ago

It looks good!

We can forget about docset_with_value initially. into_full should probably be named differently if it does not consume self.

Weird question: do you think there is a world where we can have one trait and use GAT to deal with Optional / Required / Multi cardinality?

PSeitz commented 1 year ago

I did some tests with GAT, but did run quickly into issues, because they are not object safe currently.

pub trait GATColumn: Send + Sync {
    type T: PartialOrd;
    type Item<T>;
    fn get_val(&self, idx: u64) -> Self::Item<Self::T>;
    fn into_full(
        &self,
        idx: u64,
    ) -> Arc<dyn GATColumn<Value = Self::T, Item<Self::T> = Self::T>>;
}

error[E0038]: the trait `GATColumn` cannot be made into an object
  --> src/lib.rs:10:14
   |
10 |     ) -> Arc<dyn GATColumn<Value = Self::Value, Item<Self::Value> = Self::Value>>;
   |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `GATColumn` cannot be made into an object

https://github.com/rust-lang/rust/issues/81823

fulmicoton commented 1 year ago

haha. I tried to too. It is not GAT, but

trait Column<T> {
   type ItemContainer = Into<Option<T>>;
   fn get(&self, idx: u64) -> Self::ItemContainer;
}

You then manipulate `Arc<dyn Column, ItemContainer=T>

The benefit is that you can use the same code for both containers and get monomorphization. (hopefully the code for ItemContainer = T will be as fast as if you did not play with Options.)

You still have to do some juggling/manual dispatch to build the collector though, and manipulate it.

trait Column<T> {
   type ItemContainer = Into<Iterator<Item=T>>;
   fn get(&self, idx: u64) -> Self::ItemContainer;
}

is even more tempting, as it might make it possible to work for Optional/Required/Multivalued altogether. Unfortunately, I don't know how to work with Multivalued stuff without passing a vec buffer.