quickwit-oss / tantivy-py

Python bindings for Tantivy
MIT License
279 stars 63 forks source link

[feature request] Expose means of configuring/providing a custom Object Store extension #194

Open alexkohler opened 9 months ago

alexkohler commented 9 months ago

Hi there, I'm a user of lancedb, which leverages tantivy-py for full text search indices (see https://lancedb.github.io/lancedb/fts). A current shortcoming of the lancedb FTS support is that it's only supported for local file paths (something like S3 is not supported).

The lance maintainers have authored an Object Store extension , but as I understand it, there's no means of specifying/providing this extension within tantivy-py. Would love if this could be supported!

changhiskhan commented 9 months ago

LanceDB maintainer here. We'd be happy to contribute if you can point us to the right place to configure the underlying tantivy rust to integrate this extension. thanks!

dgarnitz commented 9 months ago

I'm also in need of this feature. Any idea when it will be schedule?

dtiarks commented 8 months ago

Also very much interested!

erixison commented 8 months ago

Also interested

zmtbnv commented 7 months ago

+1

cjrh commented 7 months ago

I'm looking at the example here: https://github.com/lancedb/tantivy-object-store/blob/main/examples/index_wiki_local.rs

    <snip>
    let dir = new_object_store_directory(
        Arc::new(LocalFileSystem::new()),
        dir.path().to_str().unwrap(),
        None,
        0,
        None,
        None,
    )
    .unwrap();

    let index_using_object_store =
        tantivy::Index::create(dir, schema.clone(), IndexSettings::default()).unwrap();

    let mut writer = index_using_object_store.writer(1024 * 1024 * 1024).unwrap();
    <snip>

The only difference is the dir object that is passed to Index::create(), is that correct?

alex-au-922 commented 5 months ago

Hi LanceDB Developers, would you happy to create a PyO3 bindings in Python for your tantivy-object-store crate? I'm very willing to help

cjrh commented 3 weeks ago

@alex-au-922 are you currently working on this somewhere else? Or alternatively, @changhiskhan what is the current status on this from the lancedb pov?

alex-au-922 commented 3 weeks ago

Nope, as I afraid there's some code plagarism issues so I haven't started. Also would like to see how LanceDB's view on this, as I think LanceDB is going to build their own retriever?