pbour / hint

Code for HINT: A Hierarchical Index for Intervals in Main Memory
MIT License
6 stars 1 forks source link

Need to handle long type values, help please! #2

Open LettuceBacon opened 1 year ago

LettuceBacon commented 1 year ago

Hello. I am trying to use this index to process intervals with long type end points. A record likes 1640966444000 1640966934000. Execute cmd in README.md leads to a weird result. "Avg interval extent" and "Avg partition size" are "-nan". "Indexing time" is so short (0.000060) that it looks like it wasn't executed. Then I change line 86 of def_global.h into "typedef long Timestamp;" but get an error like below.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

How should I edit the code to support long type values?

pbour commented 1 year ago

Hello. I changed line 86 in def_global.h to "typedef long long Timestamp;" and tested a small dataset containing only two intervals: 1640966444000 1640966934000 0 231232144347

and a query file containing 0 4

Here is the report of base HINT (no optimizations) and m = 3: HINT^m

Num of intervals : 2 Domain size : 1640966934000 Avg interval extent [%] : 7.045621

Optimizations : no Num of bits : 3 Num of partitions : 15 Num of Originals : 2 Num of replicas : 0 Num of empty partitions : 13 Avg partition size : 1.000000 Read VM [Bytes] : 0 Read RSS [Bytes] : 0 Indexing time [secs] : 0.000045

Predicate type : GOVERLAPS Strategy : bottom-up Num of runs per query : 1 Num of queries : 1 Avg query extent [%] : 0.000000 Total result [XOR] : 1 Total querying time [secs]: 0.000000 Avg querying time [secs] : 0.000000

Throughput [queries/sec] : inf

Everything seems to be working; the throughput is of course "inf" because the total query time is 0; small bug there, not checking before the division.

LettuceBacon commented 1 year ago

Thanks for reply!
After changing the typedef of "Timestamp", I got proper result on the test case you mentioned. But I got "Segement fault" when ran other test cases without integer endpoints. Here's are the details.

Test case 1:

Dataset only containing intervals withlong long endpoints:

1640997298000 1640999660000
1640997297000 1640999680000

Query file only containing intervals withlong long endpoints:

1640997288000 1640999760000
1640997297000 1640999680000

Run cmd:

./query_hint_m.exec -m 3 -t -q gOVERLAPS test.dat test.qry

Result:
Segmentation fault

Test case 2

Dataset containing intervals with long long and int endpoints:

4 12345
1640997297000 1640999680000

Query file only containing intervals withlong long endpoints:

1640997288000 1640999760000
1640997297000 1640999680000

Run cmd:

./query_hint_m.exec -m 3 -t -q gOVERLAPS test.dat test.qry

Result:

HINT^m
======
Num of intervals          : 2
Domain size               : 1640999679996
Avg interval extent [%]   : 0.000073

Optimizations             : no
Num of bits               : 3
Num of partitions         : 15
Num of Originals          : 2
Num of replicas           : 0
Num of empty partitions   : 13
Avg partition size        : 1.000000
Read VM [Bytes]           : 0
Read RSS [Bytes]          : 0
Indexing time [secs]      : 0.000012

Predicate type            : GOVERLAPS
Strategy                  : top-down
Num of runs per query     : 1
Num of queries            : 2
Avg query extent [%]      : 0.000148
Total result [XOR]        : 2
Total querying time [secs]: 0.000000
Avg querying time [secs]  : 0.000000

Throughput [queries/sec]  : 6060606.060606

Test case 3

Dataset only containing intervals withlong long endpoints:

1640997298000 1640999660000
1640997297000 1640999680000

Query file containing intervals withlong long and int endpoints:

4 12345
1640997297000 1640999680000

Run cmd:

./query_hint_m.exec -m 3 -t -q gOVERLAPS test.dat test.qry

Result:
Segmentation fault

Test case 4

Dataset containing intervals in different order from Test case 2:

1640997298000 1640999660000
4 12345

Query file containing intervals withlong long and int endpoints:

1640997298000 1640999660000
1640997297000 1640999680000

Run cmd:

./query_hint_m.exec -m 3 -t -q gOVERLAPS test.dat test.qry

Result:

HINT^m
======
Num of intervals          : 2
Domain size               : 1640999659996
Avg interval extent [%]   : 0.000072

Optimizations             : no
Num of bits               : 3
Num of partitions         : 15
Num of Originals          : 2
Num of replicas           : 0
Num of empty partitions   : 13
Avg partition size        : 1.000000
Read VM [Bytes]           : 0
Read RSS [Bytes]          : 0
Indexing time [secs]      : 0.000010

Predicate type            : GOVERLAPS
Strategy                  : top-down
Num of runs per query     : 1
Num of queries            : 2
Avg query extent [%]      : 0.000145
Total result [XOR]        : 0
Total querying time [secs]: 0.000000
Avg querying time [secs]  : 0.000000

Throughput [queries/sec]  : 9049773.755656