quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.09k stars 670 forks source link

Add API to close tantivy index properly #1774

Open davideuler opened 1 year ago

davideuler commented 1 year ago

Is your feature request related to a problem? Please describe. I am calling Tantivy to create index and search in C++ project. We create indexes for each customer to provide search service. And when customer ask to delete their index by API, we will delete requested index. When deleting the index, error occurs as :

filesystem error: in remove_all: Directory not empty ["/tmp/index/mHDiE"]

The Rust part for Searcher (by CXX).

#[cxx::bridge]
mod ffi {

    // definition of Rust interface 
    extern "Rust" {
        type Searcher;

        fn create_searcher_with_param(path: &String, field_mappings:Vec<FieldMapping>, param: IndexParam) -> Result<Box<Searcher>>;

        fn add_document(searcher: &mut Searcher, docs:Vec<IdDocument>, commit: bool) -> Result<()>;

        fn search(searcher: & mut Searcher, query: &String, search_fields: & Vec<String>, search_param: & SearchParam) -> Result<Vec<IdDocument>>;

    }
} // mod ffi

// Rust Searcher definition
    pub struct Searcher{
        _index_path: String,
        schema: Schema,
        index_writer: IndexWriter,
        index_reader: IndexReader,
    }

Here is the related code in C++.

   class TantivyIndex {
        public:
            explicit TantivyIndex(std::filesystem::path path);

            void Search(const SearchRequest &request, SearchResponse &response) override;

            void AddDocuments(const std::vector<Document> &docs) override;

        private:
            std::filesystem::path path_; 
            rust::Box<Searcher> searcher_; // the Searcher which contained Tantivy Index 
   }

    class TantivyIndexTest : public ::testing::Test {
    protected:
        std::shared_ptr<TantivyIndex> index; // Box<
        std::filesystem::path path = std::filesystem::temp_directory_path() / uuid_v4() / "tantivy_index";
        std::shared_ptr<DocumentSource> rds = std::make_shared<RandomDocumentSource>();

        void removeIndex() override {
            index.reset(); // after reset(); 
            // this line crashes, says::  filesystem error: in remove_all: Directory not empty ["/tmp/index/mHDiE"]
            std::filesystem::remove_all(path);
        }

I checked the source code in Tantivy, and guess it is caused by background indexing threads.

Describe the solution you'd like

I think we need API to close tantivy index properly in such case.

[Optional] describe alternatives you've considered As a workaround, sleep some milliseconds after resetting/dropping the IndexWriter, the index directory can be removed successfully, as:

        void removeIndex() override {
            index.reset(); // after reset(); 
            std::this_thread::sleep_for(std::chrono::milliseconds (300));  // the remove_all(path) works after sleep_for().
            std::filesystem::remove_all(path);
        }
adamreichold commented 1 year ago

Maybe calling IndexWriter::wait_merging_threads already suffices?

davideuler commented 1 year ago

Thanks for your help. I tried and tested the method, it didn't work as IndexWriter.close() as expected.

The method IndexWriter::wait_merging_threads could not be called in C++ through CXX bridge. CXX need to pass reference to rust function in C++.

pub fn wait_merging_threads(self) -> Result<()>

I need to change the signature as following to pass mutable reference for invoking in C++:

pub fn wait_merging_threads(& mut self) -> Result<()>

After I update the signature in Rust IndexWriter::wait_merging_threads(& mut self), I can call it in C++ by CXX, But still need to wait some milliseconds to remove the index directory.

        void removeIndex() override {
            index.wait_merging_threads(); // Cxx bridge for IndexWriter::wait_merging_threads(& mut self)
            index.reset(); // after reset(); 
            std::this_thread::sleep_for(std::chrono::milliseconds (300));  // the remove_all(path) works after sleep_for().
            std::filesystem::remove_all(path);
        }
fulmicoton commented 1 year ago

You are on windows I assume?

davideuler commented 1 year ago

I am on Mac .

fulmicoton commented 1 year ago

ah right, the bug is about the directory being non empty... not about a file being protected for delete.