yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.26k stars 115 forks source link

Add new QBG methods to C API #138

Closed lerouxrgd closed 1 year ago

lerouxrgd commented 1 year ago

Hello,

Following up on #136, in order to have a type-safe API in ngt-rs for QBG indexes, the following functions would be required in the C API:

On a related topic, I have questions about the QBGConstructionParameters:

What is the internal_data_type ? Should it always be the same as data_type ?

Similarly, what is extended_dimension compared to dimension ? Are there some rules between them ?

masajiro commented 1 year ago

Hello, V2.0.13 has been released and includes the C APIs you requested.

internal_data_type is the vector data type in the graph for QBG. You should always set float to internal_data_type and data_type, because testing against other data types is not sufficient.

dimension must be set as the dimension of the given data. When the rotation is enabled, the given vector can be extended inside QBG to improve quantization errors. extended_dimension should be equal to or larger than dimension, and should be a multiple of 16.

lerouxrgd commented 1 year ago

Hello,

Thank you for the updates and the explanations. I was able to use qbg_append_object_as_uint8 and qbg_get_object_as_uint8 on my side.

However I am not sure to understand your answer regarding data_type/internal_data_type. If I look at the C API I can see that QBGConstructionParameters is initialized as follows:

void qbg_initialize_construction_parameters(QBGConstructionParameters *parameters)
{
  // ...
  parameters->internal_data_type = NGTQ::DataTypeFloat;
  parameters->data_type = NGTQ::DataTypeFloat;
  // ...
}

So I thought that we could also use NGTQ::DataTypeUint8 to create a QBG containing uint8_t objects. Is this correct ? Or is the only possible value NGTQ::DataTypeFloat ?

Actually, if I try to set them both to NGTQ::DataTypeUint8 I get a SIGABRT with the following gdb backtrace:

Thread 9 "qbg::index::tes" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdbffd6c0 (LWP 221840)]
0x00007ffff729f26c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff729f26c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff724fa08 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff7238538 in abort () from /usr/lib/libc.so.6
#3  0x00005555557ae73a in NGTQ::QuantizerInstance<unsigned char>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
#4  0x0000555555769b75 in NGTQ::QuantizerInstance<unsigned char>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, NGT::Property&, bool)
    ()
#5  0x000055555578567f in NGTQ::Index::getQuantizer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, NGT::Property&, bool) ()
#6  0x0000555555785b38 in NGTQ::Index::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
#7  0x0000555555785c1e in QBG::Index::Index(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) ()
#8  0x0000555555785ff7 in qbg_open_index ()
#9  0x00005555555c8eef in ngt::qbg::index::QbgIndex<u8, ngt::qbg::index::ModeWrite>::create<u8, &std::path::Path> (path=..., create_params=...) at src/qbg/index.rs:50
masajiro commented 1 year ago

Hello, NGTQ::DataTypeUint8 is for the obsolete module NGTQ. I will consider reimplementing uint8 as a data type for QBG in the near future.

masajiro commented 1 year ago

Hello, I have released v2.0.15 which includes reimplementation of uint8.

data_type is the type of the stored objects. internal_data_type is the type of the objects on the graph used by QBG. You can specify different data types for them.

lerouxrgd commented 1 year ago

Thanks a lot, it works fine now !

Looking at your commit it seems that QBG can also handle float16 ? Does QG handle float16 too ?

masajiro commented 1 year ago

Since I found that v2.0.15 does not work well in some cases, I have released v2.0.16 to fix it.

QG and QBG can handle float16 as well.

lerouxrgd commented 1 year ago

Thank you for the precision.

Then I guess some more functions would be needed in the C API for float16, I will list them in #141