quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.08k stars 670 forks source link

Provide something along the lines of `open_or_create` that accepts `&Path` #817

Closed Gozala closed 4 years ago

Gozala commented 4 years ago

Is your feature request related to a problem? Please describe.

At the moment there does not seems to be a straight forward way to reuse index across sessions. Documentation and all the examples either instantiate new index or open existing one. There seems to be Index::open_or_create

https://github.com/tantivy-search/tantivy/blob/e0499118e225ba08cfe86cb148c1c0030ccb63a2/src/core/index.rs#L113-L127

I can see that Index::create_in_dir uses MmapDirectory::open(directory_path) which is probably I could use to create Directory

https://github.com/tantivy-search/tantivy/blob/e0499118e225ba08cfe86cb148c1c0030ccb63a2/src/core/index.rs#L102-L111

But then OpenDirectoryError does not seem to be exposed (anywhere I can find) so there is no good way to handle error.

Describe the solution you'd like

I would like method similar to Index::open_or_create with a following signature:

pub fn open_or_create <P: AsRef<Path>>(
        directory_path: P,
        schema: Schema,
    ) -> crate::Result<Index> {
}

So that user does not need to understand implementation details to get going.

[Optional] describe alternatives you've considered Alternatively library could expose some function to turn paths into Directories

pub fn directory<P: AsRef<Path>>(directory_path: P) -> create::Result<impl Directory>
Gozala commented 4 years ago

From what I can currently only way to achieve desired effect without reaching for internals is to do something like following:

let index =
  Index::create_in_dir(path, self.schema.clone()).or_else(|error| match error {
      TantivyError::IndexAlreadyExists => Ok(Index::open_in_dir(path)?),
      _ => Err(error),
  })?;
fulmicoton commented 4 years ago

@Gozala have you tried building a MMapDirectory using your path, and passing it to Index::open_or_create ?

Gozala commented 4 years ago

@fulmicoton Yes, but problem is that it can error and can’t be handled with ? as error type isn’t TantivyError so user needs to handle it differently which also happens to be complicated as that Error type isn’t exposed by tanitivy.

But ignoring all that, I think it would be much better not to concern users with internals like MMapDirectory because only reason (unless I’m overlooking something) one would create it is to tanitivy

fulmicoton commented 4 years ago

Yes, but problem is that it can error and can’t be handled with ? as error type isn’t TantivyError so user needs to handle it differently which also happens to be complicated as that Error type isn’t exposed by tanitivy.

I suspect you didn't try.

The following code compiles.

fn main() -> tantivy::Result<()> {
       let schema_builder = tantivy::schema::Schema::builder();
       let schema = schema_builder.build();
       let directory = tantivy::directory::MmapDirectory::open(".")?;
       let index = tantivy::Index::open_or_create(directory, schema)?;
       Ok(())
}
Gozala commented 4 years ago

That works if I your return is tantivy::Result<()>, but it's not a case for me as I have to wrap errors across multiple libraries I'm using so I see following:

`?` couldn't convert the error to `index::Error`
  --> base/src/index.rs:85:69
   |
85 |        let directory = tantivy::directory::MmapDirectory::open(path)?;
   |                                                                     ^ the trait `std::convert::From<tantivy::directory::error::OpenDirectoryError>` is not implemented for `index::Error`
   |
   = note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
   = help: the following implementations were found:
             <index::Error as std::convert::From<std::io::Error>>
             <index::Error as std::convert::From<std::string::FromUtf8Error>>
             <index::Error as std::convert::From<std::sync::PoisonError<std::sync::RwLockReadGuard<'a, tantivy::indexer::index_writer::IndexWriter>>>>
             <index::Error as std::convert::From<std::sync::PoisonError<std::sync::RwLockWriteGuard<'a, tantivy::indexer::index_writer::IndexWriter>>>>
             <index::Error as std::convert::From<tantivy::error::TantivyError>>
   = note: required by `std::convert::From::from`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.

With the code like:

fn index(&self, path: &Path) -> Result<Index, Error> {
        // let index =
        //     Index::create_in_dir(path, self.schema.clone()).or_else(|error| match error {
        //         tantivy::TantivyError::IndexAlreadyExists => Ok(Index::open_in_dir(path)?),
        //         _ => Err(error),
        //     })?;

       let directory = tantivy::directory::MmapDirectory::open(path)?;
       let index = tantivy::Index::open_or_create(directory, self.schema.clone())?;
       index
            .tokenizers()
            .register("en_with_stopwords", Schema::tokenizer());

        Ok(index)
}

Which I see now that error is exposed so I could implement to address that

impl From<tantivy::directory::error::OpenDirectoryError> for Error {
  // ...
}

Still I think it would be prefer for those internals to not be user concern.