yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.22k stars 112 forks source link

Cannot use QuantizedGraph::quantize #124

Closed lerouxrgd closed 1 year ago

lerouxrgd commented 1 year ago

Hello @masajiro ,

I am adapting ngt-rs to latest NGT 2.0 and I am having trouble with QuantizedGraph::quantize. In versions 1.14.x I used to call quantize on a path containing a pre-built NGT index and it worked fine. Now I get the error:

QuantizedGraph::quantize: Quantized graph is already existed.

It looks like the issue is at this test, however if NGTQ_QBG is not defined we branch to the same test but this time it will quantize the graph (same behavior as before 2.0 I guess).

Is there something I am not using correctly ? When QBG is enabled we cannot use QuantizedGraph ?

lerouxrgd commented 1 year ago

Note that if I print what's in my temporary index there is only:

/tmp/.tmpUZ7Hcq/tre
/tmp/.tmpUZ7Hcq/prf
/tmp/.tmpUZ7Hcq/grp
/tmp/.tmpUZ7Hcq/obj

Whereas the error message says:

QuantizedGraph::quantize: Quantized graph is already existed. /tmp/.tmpUZ7Hcq/qg

And clearly /tmp/.tmpUZ7Hcq/qg does not exist.

masajiro commented 1 year ago

Hi @lerouxrgd , QBG::Index::quantize() of 2.0 doesn't work now. Instead of this, several steps are needed to build the QG index for 2.0. However, since the steps are a little complicated, I am going to provide a new quantize function like 1.0.

masajiro commented 1 year ago

Finally, I have released v2.0.6 including the updated QBG::Index::quantize() that is the same as the function in v1.14.x.

lerouxrgd commented 1 year ago

Hi @masajiro , Thank you or updating the code. I have tried to use it as in v1.14.x with the following test which used to work:

        // Create an index for vectors of dimension 3
        let prop = Properties::dimension(3)?;
        let mut index = Index::create(dir.path(), prop)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1.clone())?;
        let _id2 = index.insert(vec2.clone())?;

        // Build and persist the index
        index.build(1)?;
        index.persist()?;

        let params = QGQuantizationParams::default();
        let index = QGIndex::quantize(index, params)?;

Where QGIndex::quantize just calls ngtqg_quantize from C API.

But then I get the following error:

build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.
Error: Error("Capi : ngtqg_quantize() : Error: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Optimizer.h:optimize:323: the vector is empty")

Do you have an idea of what is the issue ? Should I use ngtqg_quantize differently ?

masajiro commented 1 year ago

The number of the inserted objects is more than 16 to train the quantization.

lerouxrgd commented 1 year ago

Hmm I have tried to insert many objects (up to 2k) of many sizes (up to 512 dimensions) but I always get the same error.

In my tests I use the format:

obj1 = [1, 2, 3] obj2 = [4, 5, 6] obj3 = [7, 8, 9] ...

I have also tried to fill up objects with random numbers between 0 and 1, but I still have the error.

masajiro commented 1 year ago

Do you call NGT::Index::createIndex and NGT::Index::save before calling NGTQG::Index::quantize? The following is the example of building and searching the QG index. You should run this example in the root of NGT to load data.

#include        "NGT/NGTQ/QuantizedGraph.h"
int
main(int argc, char **argv)
{
  string        indexPath       = "index";
  string        objectFile      = "./data/sift-dataset-5k.tsv";
  string        queryFile       = "./data/sift-query-3.tsv";

  // NGT index construction                                                                                                                                                                  
  try {
    NGT::Property       property;
    property.dimension          = 128;
    property.objectType         = NGT::ObjectSpace::ObjectType::Uint8;
    property.distanceType       = NGT::Index::Property::DistanceType::DistanceTypeL2;
    std::cout << "creating the index framework..." << std::endl;
    NGT::Index::create(indexPath, property);
    NGT::Index  index(indexPath);
    ifstream    is(objectFile);
    string      line;
    std::cout << "appending the objects..." << std::endl;
    while (getline(is, line)) {
      vector<float>     obj;
      stringstream      linestream(line);
      while (!linestream.eof()) {
        int value;
        linestream >> value;
        if (linestream.fail()) {
          obj.clear();
          break;
        }
        obj.push_back(value);
      }
      if (obj.empty()) {
        cerr << "An empty line or invalid value: " << line << endl;
        continue;
      }
      obj.resize(property.dimension);  // cut off additional data in the file.                                                                                                           
      index.insert(obj);
    }
    std::cout << "building the index..." << std::endl;
    index.createIndex(16);
    index.save();
  } catch (NGT::Exception &err) {
    cerr << "Error " << err.what() << endl;
    return 1;
  } catch (...) {
    cerr << "Error" << endl;
    return 1;
  }

  // quantization                                                                                                                                                                        
  size_t dimensionOfSubvector = 1;
  size_t maxNumberOfEdges = 50;
  try {
    std::cout << "quantizing the index..." << std::endl;
    NGTQG::Index::quantize(indexPath, dimensionOfSubvector, maxNumberOfEdges, true);
  } catch (NGT::Exception &err) {
    cerr << "Error " << err.what() << endl;
    return 1;
  } catch (...) {
    cerr << "Error" << endl;
    return 1;
  }

  // nearest neighbor search                                                                                                                                                             
  try {
    NGT::Index          index(indexPath);
    NGT::Property       property;
    index.getProperty(property);
    ifstream            is(queryFile);
    string              line;
    std::cout << "searching the index..." << std::endl;
    while (getline(is, line)) {
      vector<uint8_t>   query;
      {
        stringstream    linestream(line);
        while (!linestream.eof()) {
          int value;
          linestream >> value;
          query.push_back(value);
        }
        query.resize(property.dimension);
        cout << "Query : ";
        for (size_t i = 0; i < 5; i++) {
          cout << static_cast<int>(query[i]) << " ";
        }
        cout << "...";
      }

      NGT::SearchQuery          sc(query);
      NGT::ObjectDistances      objects;
      sc.setResults(&objects);
      sc.setSize(10);
      sc.setEpsilon(0.1);

      index.search(sc);
      cout << endl << "Rank\tID\tDistance: Object" << std::showbase << endl;
      for (size_t i = 0; i < objects.size(); i++) {
        cout << i + 1 << "\t" << objects[i].id << "\t" << objects[i].distance << "\t: ";
        NGT::ObjectSpace &objectSpace = index.getObjectSpace();
        uint8_t *object = static_cast<uint8_t*>(objectSpace.getObject(objects[i].id));
        for (size_t idx = 0; idx < 5; idx++) {
          cout << static_cast<int>(object[idx]) << " ";
        }
        cout << "..." << endl;
      }
      cout << endl;
    }
  } catch (NGT::Exception &err) {
    cerr << "Error " << err.what() << endl;
    return 1;
  } catch (...) {
    cerr << "Error" << endl;
    return 1;
  }

  return 0;
}
lerouxrgd commented 1 year ago

I am creating the NGT index and saving it before quantization.

I have followed your example, here are the essential steps:

let ndims = 128;
let props = NgtProperties::dimension(ndims)?
    .object_type(NgtObject::Uint8)?
    .distance_type(NgtDistance::L2)?;
let dir = tempdir()?;
let mut index = NgtIndex::create(dir.path(), props)?;

// Insert some objects of float32 (more than 16) ...

// Build and persist the index
index.build(1)?;
index.persist()?;

let params = QgParams {
    dimension_of_subvector: 1.0,
    max_number_of_edges: 50,
};
let index = QgIndex::quantize(index, params)?;

I always have the message:

build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.

But if I run the test multiple times it can have different results. Sometime the quantization process starts. Sometime it crashes with SIGABRT, I have tried to run gdb and got the following backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff79716b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff7921958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff790b53d in __GI_abort () at abort.c:79
#4  0x00007ffff7e77ea3 in NGT::Clustering::kmeans (this=0x7fffe7ffeca0, vectors=std::vector of length 25, capacity 25 = {...}, numberOfClusters=16, 
    clusters=std::vector of length 16, capacity 16 = {...}) at /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/Clustering.h:994
#5  0x00007ffff7ef8455 in _ZN3QBG9Optimizer16optimizeRotationEmRSt6vectorIS1_IfSaIfEESaIS3_EER6MatrixIfES9_S9_RS1_IS1_IN3NGT10Clustering7ClusterESaISC_EESaISE_EENSB_14ClusteringTypeENSB_18InitializationModeEmmmmbfmRdRNSA_5TimerEfb._omp_fn.0(void) () at /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Optimizer.h:253
#6  0x00007ffff78bd406 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/libgomp/team.c:129
#7  0x00007ffff796f8fd in start_thread (arg=<optimized out>) at pthread_create.c:442
#8  0x00007ffff79f1a60 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Note that:

$ c++filt _ZN3QBG9Optimizer16optimizeRotationEmRSt6vectorIS1_IfSaIfEESaIS3_EER6MatrixIfES9_S9_RS1_IS1_IN3NGT10Clustering7ClusterESaISC_EESaISE_EENSB_14ClusteringTypeENSB_18InitializationModeEmmmmbfmRdRNSA_5TimerEfb._omp_fn.0

QBG::Optimizer::optimizeRotation(unsigned long, std::vector<std::vector<float, std::allocator<float> >, std::allocator<std::vector<float, std::allocator<float> > > >&, Matrix<float>&, Matrix<float>&, Matrix<float>&, std::vector<std::vector<NGT::Clustering::Cluster, std::allocator<NGT::Clustering::Cluster> >, std::allocator<std::vector<NGT::Clustering::Cluster, std::allocator<NGT::Clustering::Cluster> > > >&, NGT::Clustering::ClusteringType, NGT::Clustering::InitializationMode, unsigned long, unsigned long, unsigned long, unsigned long, bool, float, unsigned long, double&, NGT::Timer&, float, bool) [clone ._omp_fn.0]

So it looks like QBG optimizer is involved even though I am creating a QG index with ngtqg_quantize, I don't know whether this is an issue or not.

masajiro commented 1 year ago

Since this message below is not related to this issue, you can ignore this message.

build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.

Building a QG index calls the QBG functions, because QG is implemented with QBG since v2.0.

Unfortunately, I have no idea to resolve this issue at this moment.

masajiro commented 1 year ago

@lerouxrgd This release would resolve this issue. Could you check it up?

@dmyzk Thank you for helping me resolve this issue.

lerouxrgd commented 1 year ago

@masajiro Indeed I can correctly build QGIndex by using the quantize function now !

However if I try to search an object that I have inserted, sometimes it works, but sometimes I get this error:

ngt-8890cf0f1d61f324: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Quantizer.h:1306: void NGTQ::QuantizedObjectDistance::createFloatL2DistanceLookup(void*, size_t, void*, DistanceLookupTableUint8&): Assertion `tmp >= 0 && tmp <= 255' failed.

Which is this assertion fail.

Do you have an idea about it ?

masajiro commented 1 year ago

From the error, it seems that you built NGT on a machine without avx2 nor avx512. However, QG and QBG require avx2 or avx512.

lerouxrgd commented 1 year ago

It seems that I have at least avx2 when I check with: grep avx /proc/cpuinfo.

masajiro commented 1 year ago

Just to be sure, when you build NGT, can you see the message below? #warning "AVX2 is available for NGTQG"

lerouxrgd commented 1 year ago

Yes I see this message when building NGT.

masajiro commented 1 year ago

I have confirmed that the part is not compiled on CPUs with avx2. This error is quite strange.

Another possibility is that your program loads another NGT library that was compiled without the avx2 nor avx512 option when it runs.

To confirm that, could you insert the line which outputs a message like below before the line, and run it again? std::cerr << "*** this line is reached." << std::endl; If you don't see this message, your program loads another NGT library.

masajiro commented 1 year ago

@dmyzk found how to avoid this issue as well. Thanks!

Since cargo build environment is a little different from ordinary build environments, you have to explicitly specify using AVX2, even if the cpu has AVX2. Could you insert the line below to this line? config.define("NGT_AVX2", "ON");

When you build it, please add --release. cargo build --release

lerouxrgd commented 1 year ago

It works fine now ! Thank you for your help @masajiro and @dmyzk ! I will continue updating Rust bindings to NGT 2.X

lerouxrgd commented 1 year ago

@masajiro Actually I have a quick follow up question: Is it possible to build NGT 2.X with both NGT_SHARED_MEMORY_ALLOCATOR=ON and Q(B)G enabled ? I tried it and some symbols related to QG are missing.

masajiro commented 1 year ago

Whenever NGT is built with NGT_SHARED_MEMORY_ALLOCATOR=ON, QBG and QG are disabled.

lerouxrgd commented 1 year ago

Thank you for the confirmation !