ytgui / temp

0 stars 0 forks source link

nearest neighbor search #100

Open ytgui opened 4 years ago

ytgui commented 4 years ago

https://en.wikipedia.org/wiki/Nearest_neighbor_search

ytgui commented 4 years ago

Index

https://github.com/microsoft/SPTAG https://github.com/facebookresearch/faiss

Img2Vec

sift orb ? dl

Word2Vec

? nlp

ytgui commented 4 years ago

Faiss

1-Flat

// build index
faiss::IndexFlatL2 index(n_dim);
index.add(n_batch, x_train);

// sanity check, make sure x_train's neighbor is it self
// n_y eq top_k
index.search(n_batch, x_train, n_y, y_distances, y_labels);

// real search
index.search(n_batch, x_batch, n_y, y_distances, y_labels);

IndexFlat.h

IndexFlat.cpp

ugly!!

  • IndexFlatL2::search -> knn_L2sqr -> knn_L2sqr_blas
    
    knn_L2sqr_blas(
    knn_L2sqr(x_batch,
    x_train,
    n_dim,
    n_batch,
    n_train,
    {n_batch, n_y, y_labels, y_distances}
    ), [](float dis) { return dis; // do nothing to dis });
ytgui commented 4 years ago

? 2-IVFFlat

// There are two parameters to the search method:
// nlist, the number of cells
// nprobe, the number of cells (out of nlist) that are visited to perform a search.
// The search time roughly increases linearly with the number of probes plus some constant due to the quantization.

// index
faiss::IndexFlatL2 quantizer(n_dim);
faiss::IndexIVFFlat index(&quantizer, n_dim, n_list, faiss::METRIC_L2);

// build
assert(!index.is_trained);
index.train(n_train, x_train);
assert(index.is_trained);
index.add(n_train, x_train);

// search
index.search(n_batch, x_batch, n_y, y_distances, y_labels);

// default nprobe is 1
index.nprobe = 10;
index.search(n_batch, x_batch, n_y, y_distances, y_labels);