microsoft / SPTAG

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
MIT License
4.77k stars 581 forks source link

double free or corruption (out) during Search #413

Open darae-lee opened 3 months ago

darae-lee commented 3 months ago

I want to evaluate SPTAG on the streaming track of the big-ann-benchmarks. (https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/neurips23/README.md) However, during the Search process, there was a double free or corruption (out) error. (It has to process 10K queries, but an error occurs in the middle)

Is there a problem with memory allocation and deallocation in the library? Or do I need to set some parameters?

Below is the benchmark code.

    def setup(self, dtype, max_pts, ndim):
        self.index = SPTAG.AnnIndex("BKT", "Float", ndim)

        self.index.SetBuildParam("NumberOfThreads", str(self.insert_threads), "Index")
        self.index.SetBuildParam("DistCalcMethod", self.translate_dist_fn(self._metric), "Index")
        self.max_pts = max_pts
        print('Index class constructed and ready for update/search')

    def insert(self, X, ids):
        self.index.SetBuildParam("NumberOfThreads", str(self.insert_threads), "Index")
        self.index.SetBuildParam("DistCalcMethod", self.translate_dist_fn(self._metric), "Index")
        p_meta = ''
        for i in ids:
            p_meta += str(i+1) + '\n'
        p_meta.encode()
        res = self.index.AddWithMetaData(X, p_meta, X.shape[0], True, False)

    def delete(self, ids):
        p_meta = ''
        for i in ids:
            p_meta += str(i+1) + '\n'
        p_meta.encode()

        self.index.DeleteByMetaData(p_meta)

    def query(self, X, k):
        result1 = []
        result2 = []

        for t in range(X.shape[0]):
            result = self.index.SearchWithMetaData(X[t], k)
            result1.append(result[2])
            result2.append(result[1])

        self.res = np.array(result1).reshape(-1, 10)
        self.query_dist = np.array(result2).reshape(-1, 10)