Closed gouthamve closed 4 years ago
I see the line that is erroring, but don't understand why just yet. My initial guess is that it looks like a new int column is in the Record but its not yet seen in the table's KeyTable (how this happens is unknown - I will have to investigate and test my current assumptions)
An easy mitigation is to resize Populated if we see any IDs larger than current max_int_key. You can try out master to see if this interim fix works.
My question is: how are new columns being added to the table? Are your samples dynamically getting new columns on each ingestion?
I have ideas for mitigating, but it is a little weird to me that its happening. I would like to repro it and fix it properly.
Are your samples dynamically getting new columns on each ingestion
I am not sure but I think so. https://github.com/gouthamve/redbull/blob/master/sybil.go#L171-L197
So spans have tags and different spans could have a different set of tags. And I am creating a column for each tag. This could mean that in 10K rows, there could be only one row with the column veryUniqueTag
. Is that problem?
Depending on what you are doing with those tags, a set column might be appropriate. If you aren't using the UI, you can add the tags as a set column by passing an array of string values to sybil. The set will show up when you run sybil query -table -info
and you'll be able to use 'in' and 'nin' with -set-filter on the set fields, like `-set "tags:in:foo". The set field will show up in the samples query as an array, but it can not be grouped by yet in other queries.
Now that I know the scenario a bit better, I can investigate how to properly fix it
The advantage/disadvantage of the two options:
One dynamic column per tag:
One set column for tags:
I will pursue making the dynamic column generation safer
Adding -save-srb (from a1d2f05be92ebce9a8e9d00013a88c54173f2c18 in master) should help alleviate this issue. (crossing fingers)
I believe save-srb worked here for tables that have lots of dynamic columns being added. I'm debating whether to turn it on always or not, but will add information to the main README about this gotcha
Hey everyone, merry christmas! I'm trying to use sybil in a write heavy scenario (as an index for https://github.com/jaegertracing/jaeger/) and got quite far. I'm flushing 256 records at a time and am doing about 1000 records/sec. I'm skipping digestion on ingest and am doing the digest in a background routine running every 2 seconds.
I'm seeing this panic:
This is with the latest master, it fails again for the next few digest attempts but later succeeds. I am curious what might be causing it. Also digestion is taking upto 10secs, is there anything I can optimise to make it faster?
Thanks a ton for this project!