Closed authwork closed 3 years ago
What kind of queries do you need to support on the vector of integers? Orthogonal range queries? If so take a look at the MultidimensionalPGMIndex. If you only need to support lookups, then a hash table may be more appropriate.
Currently, I have some key-value pairs (uint160_t (32*5), value), how to use it as the key in dynamic PGM-index?
I will push some code to support the operation that you ask later this year 😉
@gvinciguerra
If I want to use MultidimensionalPGMIndex, how can I set it as KV pair as you show me at PGMIndex? And how to dynamically change it (add/delete).
struct KKKVVV{
std::tuple<uint32_t, uint32_t, uint32_t, uint32_t, uint32_t> pk;
VVV pv;
operator std::tuple<uint32_t, uint32_t, uint32_t, uint32_t, uint32_t>() const {return pk; }
};
typedef pgm::MultidimensionalPGMIndex<5, uint32_t, 32> u32MPGM;
void multi_pgm_test(std::vector<KKK> key, std::vector<VVV> value){
int number = key.size();
std::vector<KKKVVV> data(number);
struct timeval start, end;
gettimeofday(&start, NULL);
for(int i=0; i<number; i++){
std::get<0>(data[i].pk) = key[i].a >> 32;
std::get<1>(data[i].pk) = key[i].a & 0xFFFFFFFF;
std::get<2>(data[i].pk) = key[i].b >> 32;
std::get<3>(data[i].pk) = key[i].c >> 32;
std::get<4>(data[i].pk) = key[i].d >> 32;
data[i].pv = value[i];
}
u32MPGM pgm_index(data.begin(), data.end());
for(int i=0; i<number; i++){
for (auto it = pgm_index.range(data[i].pk, data[i].pk); it != pgm_index.end(); ++it){
VVV b = it->pv;
if (b.a != value[i].a || b.b != value[i].b || b.c != value[i].c) {
printf("failed\n");
return;
}
}
}
gettimeofday(&end, NULL);
int time_len = 1000000 * (end.tv_sec - start.tv_sec)
+ (end.tv_usec - start.tv_usec);
std::cout << "It costs " << time_len << std::endl;
uint64_t size = pgm_index.size_in_bytes();
printf("pgm size is %lld\n", size);
}
There is no support for this yet. Ideally, one could combine the design/implementation of DynamicPGMIndex
with the one of MultidimensionalPGMIndex
, with the proper extensions.
There is no support for this yet. Ideally, one could combine the design/implementation of
DynamicPGMIndex
with the one ofMultidimensionalPGMIndex
, with the proper extensions.
@gvinciguerra How about the first problem? It seems MultidimensionalPGMIndex only accepts for std::tuple data. What should I do to assign a structure data (KKKVVV
)? The code I show before will give following error log:
/usr/include/c++/9/tuple:1277:29: error: incomplete type ‘std::tuple_size<const KKKVVV>’ used in nested name specifier
1277 | inline constexpr size_t tuple_size_v = tuple_size<_Tp>::value;
PGM-index/include/pgm/pgm_index_variants.hpp:917:37: error: no matching function for call to ‘pgm::MultidimensionalPGMIndex<5, unsigned int, 32>::encode(const KKKVVV&)’
917 | data.emplace_back(encode(x));
Does replacing this line
with
std::for_each(first, last, [&](const value_type &x) {
work for you?
@gvinciguerra After this change, it shows:
pgm.cpp:169:16: error: ‘const value_type’ {aka ‘const class std::tuple<unsigned int, unsigned int, unsigned int, unsigned int, unsigned int>’} has no member named ‘pv’
169 | VVV b = it->pv;
This happens at query the value from the structure KKKVVV
.
for (auto it = pgm_index.range(data[i].pk, data[i].pk); it != pgm_index.end(); ++it){
VVV b = it->pv;
if (b.a != value[i].a || b.b != value[i].b || b.c != value[i].c) {
printf("failed\n");
return;
}
}
@gvinciguerra
Another try is to convert the five integer into one large double value. But double value is very hard to get equal when searching, can the user define this equal condition, said abs(a-b) < 1e-30?
Another try is to convert the five integer into one large double value.
I do not recommend this: with a double you cannot express all the configurations of values of the five uint32_t
s.
Anyway, as I said, I'll try to add the support for longer keys later this year ;)
@gvinciguerra Many thanks for your effort. How about this MultidimensionalPGMIndex issue?
@gvinciguerra After this change, it shows:
pgm.cpp:169:16: error: ‘const value_type’ {aka ‘const class std::tuple<unsigned int, unsigned int, unsigned int, unsigned int, unsigned int>’} has no member named ‘pv’ 169 | VVV b = it->pv;
This happens at query the value from the structure
KKKVVV
.for (auto it = pgm_index.range(data[i].pk, data[i].pk); it != pgm_index.end(); ++it){ VVV b = it->pv; if (b.a != value[i].a || b.b != value[i].b || b.c != value[i].c) { printf("failed\n"); return; } }
Indeed the MultidimensionalPGMIndex
act as a container for tuples, and in your example it stores only the pk
member of struct KKKVVV
. You should be able to modify the MultidimensionalPGMIndex
class to store also pv
alongside the tuple.
Hello! @gvinciguerra With the growing big data, sometimes we may need a vector of integers as the key to index some values. Currectly support is to use a hash function to combine such a vector of integer into a uint128_t number.
Is there any possible to extend the build-in ML model (f(x) = kx + b). For example, [x1, x2, x3, ..., xn], build a ML model f(x1, x2, x3, ..., xn) = c0 + c1x1 + c2x2 + ... + cnxn?
In this way, it could become more general. Currently, I use a multiple-level PGM index (#26) to implement this, but seems much more overhead (size, time, etc) than the single-level.
Many thanks.