panic when digesting - Githubissues

gouthamve commented 4 years ago

Hey everyone, merry christmas! I'm trying to use sybil in a write heavy scenario (as an index for https://github.com/jaegertracing/jaeger/) and got quite far. I'm flushing 256 records at a time and am doing about 1000 records/sec. I'm skipping digestion on ingest and am doing the digest in a background routine running every 2 seconds.

I'm seeing this panic:

2019/12/24 20:15:34 WRITING PID 7595 TO LOCK /data/sybil-db/jaeger/info.lock
2019/12/24 20:15:34 LOCKING /data/sybil-db/jaeger/info.lock
2019/12/24 20:15:34 OPENING TABLE INFO FROM FILENAME /data/sybil-db/jaeger/info.db
2019/12/24 20:15:34 UNLOCKING /data/sybil-db/jaeger/info.lock
2019/12/24 20:15:34 WRITING PID 7595 TO LOCK /data/sybil-db/jaeger/stomache.lock
2019/12/24 20:15:34 LOCKING /data/sybil-db/jaeger/stomache.lock
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_095872863.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_502198671.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_762224642.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_582140463.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_214818442.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_113069503.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_482631791.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_418823239.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_351887204.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_071346249.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_470188504.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_199584949.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_097573806.db
2019/12/24 20:15:34 LOADED 256 FOR DIGESTION FROM /data/sybil-db/jaeger/stomache817963248/ingest_942070294.db
panic: runtime error: index out of range [240] with length 240

goroutine 1 [running]:
github.com/logv/sybil/src/lib.SavedRecord.toRecord(0xc005588300, 0x4, 0x4, 0xc005576900, 0xf, 0xf, 0x0, 0x0, 0x0, 0xc0000c4000, ...)
    /home/goutham/go/src/github.com/logv/sybil/src/lib/row_store.go:54 +0x995
github.com/logv/sybil/src/lib.(*Table).LoadRecordsFromLog(0xc0000c4000, 0xc00539ed80, 0x3b, 0xc00539ed80, 0x3b, 0x13)
    /home/goutham/go/src/github.com/logv/sybil/src/lib/row_store.go:132 +0x208
github.com/logv/sybil/src/lib.(*Table).LoadRowStoreRecords(0xc0000c4000, 0xc000116166, 0x11, 0xc0031a1d70)
    /home/goutham/go/src/github.com/logv/sybil/src/lib/table_ingest.go:176 +0x441
github.com/logv/sybil/src/lib.(*Table).DigestRecords(0xc0000c4000)
    /home/goutham/go/src/github.com/logv/sybil/src/lib/table_ingest.go:324 +0x489
github.com/logv/sybil/src/cmd.RunDigestCmdLine()
    /home/goutham/go/src/github.com/logv/sybil/src/cmd/cmd_digest.go:27 +0x233
main.main()
    /home/goutham/go/src/github.com/logv/sybil/main.go:95 +0xcc

This is with the latest master, it fails again for the next few digest attempts but later succeeds. I am curious what might be causing it. Also digestion is taking upto 10secs, is there anything I can optimise to make it faster?

Thanks a ton for this project!

okayzed commented 4 years ago

I see the line that is erroring, but don't understand why just yet. My initial guess is that it looks like a new int column is in the Record but its not yet seen in the table's KeyTable (how this happens is unknown - I will have to investigate and test my current assumptions)

An easy mitigation is to resize Populated if we see any IDs larger than current max_int_key. You can try out master to see if this interim fix works.

okayzed commented 4 years ago

My question is: how are new columns being added to the table? Are your samples dynamically getting new columns on each ingestion?

I have ideas for mitigating, but it is a little weird to me that its happening. I would like to repro it and fix it properly.

gouthamve commented 4 years ago

Are your samples dynamically getting new columns on each ingestion

I am not sure but I think so. https://github.com/gouthamve/redbull/blob/master/sybil.go#L171-L197

So spans have tags and different spans could have a different set of tags. And I am creating a column for each tag. This could mean that in 10K rows, there could be only one row with the column veryUniqueTag. Is that problem?

okayzed commented 4 years ago

Depending on what you are doing with those tags, a set column might be appropriate. If you aren't using the UI, you can add the tags as a set column by passing an array of string values to sybil. The set will show up when you run sybil query -table -info and you'll be able to use 'in' and 'nin' with -set-filter on the set fields, like `-set "tags:in:foo". The set field will show up in the samples query as an array, but it can not be grouped by yet in other queries.

Now that I know the scenario a bit better, I can investigate how to properly fix it

okayzed commented 4 years ago

The advantage/disadvantage of the two options:

One dynamic column per tag:

only load the data off disk relevant to the tags being searched. If there are a lot of tags in a row block, this might save disk IO time.
creates a new column for each tag, which will raise the digest time if there are lots of unique tags (one file written per tag)
requires sybil to be more safe about column creation, could slow down ingestion path

One set column for tags:

loads all tags off disk whenever filtering by tags - can be bad but it really depends on the size of the file. My advice is to log a tag column until a block is filled and then see how big the tags file is.
digestion should be faster

I will pursue making the dynamic column generation safer

okayzed commented 4 years ago

Adding -save-srb (from a1d2f05be92ebce9a8e9d00013a88c54173f2c18 in master) should help alleviate this issue. (crossing fingers)

okayzed commented 4 years ago

I believe save-srb worked here for tables that have lots of dynamic columns being added. I'm debating whether to turn it on always or not, but will add information to the main README about this gotcha

logv / sybil

panic when digesting #105