Closed authwork closed 3 years ago
Yep, the computation of overall memory (data + index) seems correct.
@gvinciguerra I found more surprising things: I test the same program on different machines, but get totally different results by using PGM-index. Using c++ 9, the index size is 205532009 bytes. Using c++ 7, the index size is 309379385 bytes.
#include <iostream>
#include <sys/random.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <string>
#include <time.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include <unordered_map>
#include "pgm/pgm_index.hpp"
#include "pgm/pgm_index_dynamic.hpp"
#include "pgm/pgm_index_variants.hpp"
struct T{
uint64_t a, b, c;
};
typedef pgm::DynamicPGMIndex<uint64_t, T> u64L;
int main(){
int Max_J = 0x8F, Max_i = 0xFFFF;
uint64_t key = 0;
struct timeval start, end;
int checkRan = 1;
long time_len;
gettimeofday(&start, NULL);
pgm::DynamicPGMIndex<uint64_t, u64L*> d_index;
for (uint64_t j = 0; j < Max_J; j++) {
auto iter = d_index.find(j);
for (uint64_t i = 0; i < Max_i; i++) {
T a;
a.a = i;
a.b = 2 * i;
a.c = 3 * i;
if (iter == d_index.end()) {
u64L *p = new u64L();
d_index.insert_or_assign(j, p);
}
iter = d_index.find(j);
key = std::rand();
if(checkRan){
key = i;
}
iter->second->insert_or_assign(key, a);
}
}
gettimeofday(&end, NULL);
time_len = 1000000 * (end.tv_sec - start.tv_sec)
+ (end.tv_usec - start.tv_usec);
std::cout << "It costs " << time_len << std::endl;
start = end;
if (checkRan) {
for (uint64_t j = 0; j < Max_J; j++) {
auto iter = d_index.find(j);
for (uint64_t i = 0; i < Max_i; i++) {
T b = iter->second->find(i)->second;
if (b.a != i || b.b != 2 * i || b.c != 3 * i) {
printf("failed\n");
return -1;
}
}
}
}
uint64_t table_size = d_index.size_in_bytes();
for (auto &e : d_index) {
table_size += e.second->size_in_bytes();
}
printf("FP table second index is %lld\n", table_size);
return 0;
}
Could you check if you are using the master branch version of the PGM on both machines?
Yes, they are under the same version @ e2e82d77c50130c11941479301b9f835d2db1bb9.
Oh, it seems there are new updates.:)
@gvinciguerra After update to the last version, both of them become 309379385 bytes.
Thanks for checking.
It's strange, under gcc version 9.3.1 and commit e2e82d7 I get 309 MB as well. Indeed, the overall space of the u64L
instances should be at least Max_J * Max_i * (sizeof(uint64_t) + sizeof(T)) = 0x8F * 0xFFFF * (8 + 24) =
300 MB.
Could you check the output of gcc -v
?
It is gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
I should backup previous code, when I reset the head to commit e2e82d7, I found I cannot reproduce the result of 205M, although I used to get this result for multiple times in the past.
@gvinciguerra Hello, I have done a experiment to build multi-level PGM-index. Why? I use it to support very large integer, like a 256-bit long integer. Assume I have a uint256_t num, I set num[0] is a uint128_t with high 128 bits of num, num[1] is also a uint128_t with low 128 bits of num. To index with PGM-index
for insert. I will first insert a type1 variable into type2, then insert the real value to the corresponding type1, such as:
To calculate the size of the whole index:
Does
size
correctly calculate the memory size ofindex2nd
?