Closed cbrake closed 4 years ago
Do you get the same performance drop off if you don't use an index? If there is no index handling in the insert, then the performance should be the exact same as encoding time + normal bolt insert time.
much flatter without index:
So, I guess with timeseries data, you don't really want to use an index because the index is huge.
Another way to do this might be to put each sample type in its own bucket.
Or, there may be a more efficient way to implement an index -- perhaps a separate bucket for each sample Type, and each index is a separate record in the bucket -- then adding records would be fast? Databases are fun to think about -- lots of tradeoffs to be made.
Thanks for the help!
You should still be able to use indexes on time series data, but what I'm guessing is happening is that your index on "tag" might not be very unique. It's usually a good idea to have fairly unique values in indexes, however in a regular database it shouldn't impact performance that drastically during inserts.
However, what I do with indexes in bolthold is a pretty naive implementation. I simply store then entire index under one key value, so the less unique the index, the more and more gets stored (and thus decoded, and encoded) on each insert. I'm guessing that's what's happening with your scenario here.
I can make my index handling more like a "real" database, and split the values across multiple keys, but it'll take quite a bit of reworking.
I'll open an issue for that. I appreciate you bringing this up.
yes, I'm using a small # of Types relative to the # of samples -- maybe 6 or so, so they are not very unique.
one more note -- without an index, and with 500,000 samples in DB, the insert time is still ~50ms/sample -- this is great -- means I can use bolthold to record about any amount of timeseries data on this device. Currently using around 715 bytes/sample -- would like to experiment with protobuf to see if that would be faster/more efficient.
Your discussion helped me a lot. Do you think how many fields use index also affect the performance? hm... I need query for the logs with start/end date to 1,000,000 logs. so I need index. Can I ask you any suggestion?
Having many indexes will definitely impact performance of inserts and updates, because those indexes will need to be maintained on every insert and update.
I wouldn't recommend putting an index on a date/time if you can help it. Go Time
structs are very accurate, so you'll end up with very a non-unique index.
If you have start date
and end date
as fields, I would recommend having start as your key value, and always querying with the start date
.
One problem I ran into using the Go Time type as a key is the gob encoded data of Go Time is not always monotonic with time, so seeks to a date would not always work. When I converted a time stamps to int64, and inserted bytes into key in big-endian format, seeks were then very fast and reliable. I may be missing something though, but it seems since Go Time is a struct, the encoded data for it will likely not always be monotonic.
@timshannon Thank you for your advice.
key
and index
? or just key
will work? If you have start date and end date as fields, I would recommend having start as your key value, and always querying with the start date.
@cbrake Thanks. I wil l try to use int64 (unix time) so, I get query start/end data as RFC3339 and convert them to uint64. then query to bolthold
I've been using bolthold on an embedded Linux system (eMMC storage). I'm noticing that as the DB grows, the write performance falls off linearly.
I'm using an increasing timestamp for the key, so I would think that would be sequential VS random access.
Bellow is the insert code:
Once I get to 100,000 samples or so, the performance is really slow (2+ seconds to insert a sample). I'm thinking something is not quite right, as I read about people using multi TB bolt databases, but it seems with my use case, there is no way this could work.
I tried setting FreelistType to FreelistMapType -- that did not seem to make any difference.
Appreciate any thoughts is this normal, or can this be optimized.
Cliff