n-young / trustdb

0 stars 1 forks source link

start_timestamp and end_timestamp should use timestamps from datapoints #17

Closed desmondcheongzx closed 3 years ago

desmondcheongzx commented 3 years ago

From #14 , we currently use Utc::now() when creating a new block and when freezing to determine start_timestamp and end_timestamp. However, we should instead use the range of timestamps that are included in the block so that we know if the block is relevant.

n-young commented 3 years ago

Another issue is that all of the timestamps in the data_small.txt file are between 5 seconds, which means there are only 5 different timestamp values, and they're all at second-level granularity, which is an issue for over 1k data points... I wish they had ms level granulaity in timestamps :(

This creates two problems, 1) we need to change our indexing strategy because we'll have a lot of blocks that span 0 or 1 seconds in the way we're doing it, and 2) we need to figure out what order even means in our series, since ordering by time is arbitrary when everything comes at the same time.

Alternatively, maybe our block sizes are just too damn small, and we need to be storing hundreds of thousands of records per block? It would make sense - high cardinality means each series is really small, so maybe right now we're storing about 1000 records in 1000 different series of length 1...

n-young commented 3 years ago

Fixed