Closed authwork closed 3 years ago
I commented out the code related to hashmap_test
and alex_test
, and the program exits with code 0 after printing
It costs 1170674
pgm size is 92796943
So it does not seem related to pgm_test
...
Also, please do not open multiple issues for the same question (#26 #27), I try my best to answer when I can.
@gvinciguerra
I do as you suggested to comment out the code related to hashmap_test
and alex_test
.
But it still got killed. Very strange:(
I try this code with g++9 and g++7 on different ubuntu systems.
I am using version 4cdd5de6a941de4653da04e8cecd299e62b47788
#include <iostream>
#include <sys/random.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <string>
#include <time.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include <unordered_map>
#include <sys/time.h>
#include "pgm/pgm_index.hpp"
#include "pgm/pgm_index_dynamic.hpp"
#include "pgm/pgm_index_variants.hpp"
struct KKK {
uint64_t a;
uint32_t b, c, d;
};
struct VVV{
uint64_t a, b, c;
};
typedef pgm::DynamicPGMIndex<uint32_t, VVV> u32P;
typedef pgm::DynamicPGMIndex<uint64_t, u32P*> u64PGM2;
typedef pgm::DynamicPGMIndex<uint64_t, u64PGM2*> u64PGM1;
void pgm_test(std::vector<KKK> key, std::vector<VVV> value){
u64PGM1 table;
int number = key.size();
struct timeval start, end;
gettimeofday(&start, NULL);
for (int i = 0; i < number; i++) {
uint64_t k1 = key[i].a;
uint64_t k2 = ((uint64_t)key[i].b) << 32 + (uint64_t)key[i].c;
uint32_t k3 = key[i].d;
auto iter = table.find(k1);
if(iter == table.end()){
table.insert_or_assign(k1, new u64PGM2());
iter = table.find(k1);
}
auto p2 = iter->second;
auto iter1 = p2->find(k2);
if(iter1 == p2->end()){
p2->insert_or_assign(k2, new u32P());
iter1 = p2->find(k2);
}
iter1->second->insert_or_assign(k3, value[i]);
}
for (int i = 0; i < number; i++) {
uint64_t k1 = key[i].a;
uint64_t k2 = ((uint64_t)key[i].b) << 32 + (uint64_t)key[i].c;
uint32_t k3 = key[i].d;
auto iter = table.find(k1)->second->find(k2)->second->find(k3);
VVV b = iter->second;
if (b.a != value[i].a || b.b != value[i].b
|| b.c != value[i].c) {
printf("failed\n");
return;
}
}
gettimeofday(&end, NULL);
int time_len = 1000000 * (end.tv_sec - start.tv_sec)
+ (end.tv_usec - start.tv_usec);
std::cout << "It costs " << time_len << std::endl;
uint64_t size = table.size_in_bytes();
for (auto &e : table) {
size += e.second->size_in_bytes();
for(auto &e1 : *(e.second)){
size += e1.second->size_in_bytes();
}
}
printf("pgm size is %lld\n", size);
}
int main(){
std::vector<KKK> x_key;
std::vector<VVV> x_value;
for(int i=0; i<3199807; i++){
KKK a;
VVV b;
a.a = i >> 96;
a.b = i >> 64;
a.c = i >> 32;
a.d = i;
b.a = 1 * i;
b.b = 2 * i;
b.c = 3 * i;
x_key.push_back(a);
x_value.push_back(b);
}
pgm_test(x_key, x_value);
}
update:
On a machine with around 500G memory, I finish the test with output:
It costs 1648490
Hash table size is 175731056
It costs 15764876
alex size is 1575900636
It costs 75040896
pgm size is 4649320779
With only pgm_test
, I got
It costs 69553765
pgm size is 4649320779
@gvinciguerra Why my result is too far away from what you got? T_T
@gvinciguerra update
It costs 1133163
alex size is 24442892
It costs 3632979
pgm size is 16984181
I found a mistake in my code,
a.a = uint64_t((__uint128_t)i >> 96);
a.b = uint32_t((__uint128_t)i >> 64);
a.c = uint32_t((__uint128_t)i >> 32);
a.d = i;
This will produce the correct result.
@gvinciguerra I have done a test among unordered_map, ALEX, and pgm_index. The result is very strange
Is there any suggestion for designing Multiple-level Dynamic PGM-index? Many thanks in advance. If you would like to have a try, please use