Closed lerouxrgd closed 1 year ago
The following source code is a sample to use QG. Before running this, please run the commands below.
$ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv
$ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv
$ head -1 sift-128-euclidean_query.tsv > query.tsv
#include "NGT/Index.h"
#include "NGT/NGTQ/Capi.h"
int
main(int argc, char **argv)
{
std::string indexPath = "index";
std::string objectFile = "sift-128-euclidean.tsv";
std::string queryFile = "query.tsv";
std::cerr << "run the following commands to prepare data for this sample program." << std::endl;
std::cerr << " curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv" << std::endl;
std::cerr << " curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv" << std::endl;
std::cerr << " head -1 sift-128-euclidean_query.tsv > query.tsv" << std::endl;
std::cerr << std::endl;
std::cerr << "index path=" << indexPath << std::endl;
std::cerr << "object file=" << objectFile << std::endl;
std::cerr << "query file=" << queryFile << std::endl;
std::cerr << std::endl;
NGTError err = ngt_create_error_object();
NGTProperty prop = ngt_create_property(err);
if (prop == NULL) {
std::cerr << ngt_get_error_string(err) << std::endl;
return 1;
}
size_t dimension = 128;
ngt_set_property_dimension(prop, dimension, err);
std::cerr << "create an empty index..." << std::endl;
NGTIndex index = ngt_create_graph_and_tree(indexPath.c_str(), prop, err);
if (index == NULL) {
std::cerr << ngt_get_error_string(err) << std::endl;
return 1;
}
std::cerr << "insert objects..." << std::endl;
try {
std::ifstream is(objectFile);
std::string line;
while (getline(is, line)) {
std::vector<double> obj;
std::stringstream linestream(line);
while (!linestream.eof()) {
float value;
linestream >> value;
if (linestream.fail()) {
obj.clear();
break;
}
obj.push_back(value);
}
if (obj.empty()) {
std::cerr << "An empty line or invalid value: " << line << std::endl;
return 1;
}
if (ngt_insert_index(index, obj.data(), dimension, err) == 0) {
std::cerr << ngt_get_error_string(err) << std::endl;
return 1;
}
}
} catch (NGT::Exception &err) {
std::cerr << "Error " << err.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Error" << std::endl;
return 1;
}
std::cerr << "build the index..." << std::endl;
if (ngt_create_index(index, 100, err) == false) {
std::cerr << "Error:" << ngt_get_error_string(err) << std::endl;
return 1;
}
std::cerr << "save the index..." << std::endl;
if (ngt_save_index(index, indexPath.c_str(), err) == false) {
std::cerr << ngt_get_error_string(err) << std::endl;
return 1;
}
std::cerr << "close the index..." << std::endl;
ngt_close_index(index);
NGTQGQuantizationParameters quantizationParameters;
ngtqg_initialize_quantization_parameters(&quantizationParameters);
std::cerr << "quantize the index..." << std::endl;
ngtqg_quantize(indexPath.c_str(), quantizationParameters, err);
std::cerr << "open the quantized index..." << std::endl;
index = ngtqg_open_index(indexPath.c_str(), err);
if (index == NULL) {
std::cerr << ngt_get_error_string(err) << std::endl;
return 1;
}
std::ifstream is(queryFile);
if (!is) {
std::cerr << "Cannot open the specified file. " << queryFile << std::endl;
return 1;
}
std::string line;
float queryVector[dimension];
if (getline(is, line)) {
std::vector<double> queryObject;
{
std::vector<std::string> tokens;
NGT::Common::tokenize(line, tokens, " \t");
tokens.resize(dimension);
if (tokens.size() != dimension) {
std::cerr << "dimension of the query is invalid. dimesion=" << tokens.size() << ":" << dimension << std::endl;
return 1;
}
for (std::vector<std::string>::iterator ti = tokens.begin(); ti != tokens.end(); ++ti) {
queryVector[distance(tokens.begin(), ti)] = NGT::Common::strtod(*ti);
}
}
NGTObjectDistances result = ngt_create_empty_results(err);
NGTQGQuery query;
ngtqg_initialize_query(&query);
query.query = queryVector;
query.size = 10;
query.result_expansion = 100;
query.epsilon = 0.1;
std::cerr << "search the index for the specified query..." << std::endl;
ngtqg_search_index(index, query, result, err);
auto rsize = ngt_get_result_size(result, err);
std::cout << "Rank\tID\tDistance" << std::endl;
for (size_t i = 0; i < rsize; i++) {
NGTObjectDistance object = ngt_get_result(result, i, err);
std::cout << i + 1 << "\t" << object.id << "\t" << object.distance << std::endl;
}
ngt_destroy_results(result);
}
std::cerr << "close the quantized index" << std::endl;
ngtqg_close_index(index);
ngt_destroy_error_object(err);
return 0;
}
I hope that this will be helpful.
Thank you for the example, however this is a QG index example. I am actually wondering how to build a QBG index (using functions qbg_create
and qbg_open_index
as I mentioned).
Since I was able to reproduce the issue you mentioned, I have released v2.0.10 to resolve it, and added the usage example of QBG C APIs as well.
Thank you for the fix and the example, it helps a lot !
I am able to run your qbg-capi
example on my machine, however when I try to do a very similar unit test in Rust I cannot insert an object and get the following error:
Error: Error("Capi : qbg_append_object() : Error: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/ObjectFile.h:put:169: ObjectFile::Dimensions are inconsistency. 256:128")
Note that I set construction parameter dimension to 128 and I only insert objects of dimension 128, so it is very strange to find 256 in the error message. If I try with a different dimension there is always a factor 2 in the error message.
The QBG unit test is here, and I have a similar one that works for QG index here.
Actually this was an issue on my side, I was not using the good value for QbgObject::Float
.
I am now running into a deeper issue that leads to a SIGSEGV when I run the unit test (the same I linked above).
Using gdb
to debug it gives me:
Thread 2 "qbg::index::tes" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff69ff6c0 (LWP 638023)]
NGT::NeighborhoodGraph::BooleanVector::insert (i=1, this=0x7ffff69fd170) at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/Graph.h:815
815 inline void insert(size_t i) { std::vector<bool>::operator[](i) = true; }
(gdb) bt
#0 NGT::NeighborhoodGraph::BooleanVector::insert (i=1, this=0x7ffff69fd170) at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/Graph.h:815
#1 QBG::Index::searchBlobGraph (this=this@entry=0x7fff08001450, searchContainer=..., seeds=...) at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/QuantizedBlobGraph.h:770
#2 0x00007ffff7f1e9ec in QBG::Index::searchBlobGraph (this=this@entry=0x7fff08001450, searchContainer=...) at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/QuantizedBlobGraph.h:722
#3 0x00007ffff7f1f049 in qbg_search_index_ (results=<optimized out>, param=..., query=Python Exception <class 'gdb.error'>: value has been optimized out
<synthetic pointer>, pindex=0x7fff08001450)
at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Capi.cpp:389
#4 qbg_search_index (index=0x7fff08001450, query=..., results=<optimized out>, error=0x7ffff006f2a0) at /home/rleroux/dev/workspaces/rust/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Capi.cpp:407
#5 0x000055555558d03f in ngt::qbg::index::QbgIndex::search (self=0x7ffff69fe170, query=...) at src/qbg/index.rs:116
Thank you for narrowing it down ! Everything works fine now, I will be able to update ngt-rs to NGT 2.0 now !
Hello @masajiro ,
Currently I use
qbg_create
at a specific path which return true (and not a pointer to the QBGIndex), so just after I useqbg_open_index
with the same path to get a QBGIndex pointer but it throws the following error:Capi : qbg_open_index() : Error: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/QuantizedBlobGraph.h:Index:317: QBG::Index: No quantized blob graph. /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/QuantizedBlobGraph.h:load:988: Not found the rearranged inverted index. [/tmp/.tmp9KnL6r]
Should I proceed differently ?