Eblob: lots of errors after updating version from 0.22.16 to 0.23.0

rudneff commented 9 years ago

After updating eblob library i got some errors. I think they linked together. But before that one question, please: what means -1 in this message:

2015-07-28 19:11:33.291628 2: blob: start
2015-07-28 19:11:33.292360 2: bctl: index: 2/-1, using unsorted index: size: 146208, num: 1523, data: size: 30878861, max blob size: 200000000

In version 0.22.23 this message appeared after force restart of my application, then eblob_init finished with SIGABRT every time i tried to restart. Now i can't reproduce this terrible bug.

Error while starting application every second start (errno 29)

This appears every second start. Write thread is blocked with errno 29. Read thread works normally. For working properly restart required.

2015-07-28 18:36:39.683964 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:39.684032 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737219917568: going sleep!
2015-07-28 18:36:39.716113 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:39.716170 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737203132160: going sleep!
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 877 err: 0
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 0 err: 0
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 0 err: 0
2015-07-28 18:36:42.684256 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 15040, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:42.684327 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 15040, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737219917568: going sleep!
2015-07-28 18:36:42.716380 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 17873, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:42.716451 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 17873, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737203132160: going sleep!

Thank you for your supporting.

bioothod commented 9 years ago

bctl: index: 2/-1 shows current and maximum index known to eblob. When eblob starts it doesn't know how many blob files (and indexes) are available, it has to enumerate them, and while doing this enumeration eblob sets maximum known index to -1, this allows to determine maximum real index (it is always bigger than -1).

bioothod commented 9 years ago

2015-07-28 18:36:39.683964 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29

-29 error usually means data corruption - eblob tries to read outside of the index Try running eblob_merge tool on your blobs and check again whether this issue persists

rudneff commented 9 years ago

I merged two blobs into one before presumably "bad" start (with error 29). Result:

Completed input stream /tmp/test_queue/data-0.36: total: 2, rest: 2
Completed all blobs
Total records: 17972
Written records: 5627
Removed records: 12345
Broken records: 0

After merge everything works good. But erarlier (0.21.16) such problems doesn't exist. I mean merge was not needed, whether i finished program correctly or not.

bioothod commented 9 years ago

Crash should happen only once to become visible, probably you were lucky not to corrupt data with previous crashes. For example they might happen after sync timeout (you can specify rather small timeout or even zero, but it heavily affects performance).

agend commented 9 years ago

@bioothod and merge tool fix broken blobs?

bioothod commented 9 years ago

@agend eblob_merge tries to fix broken blobs, it iterates over blob and indexes and skips broken and removed entries, all good records are being copied into destination blob. It is possible to run over multiple input blobs and produce one output blob.

agend commented 9 years ago

That os strange as we never had this issue before (i think from the beginning of our usage - couple years) and there we have reproducible scenario. May be you consider to take a more deeper look into our case? We can also run test over previous version of eblob.

On 29 июля 2015 г., at 1:30, Evgeniy Polyakov notifications@github.com wrote:

@agend eblob_merge tries to fix broken blobs, it iterates over blob and indexes and skips broken and removed entries, all good records are being copied into destination blob. It is possible to run over multiple input blobs and produce one output blob.

— Reply to this email directly or view it on GitHub.

bioothod commented 9 years ago

If it is easily reproducible, please show us backtrace (with debug package installed) after sigabort

agend commented 9 years ago

I think we have found issue and it's on our side. @rudneff Please close it

reverbrain / eblob

Eblob: lots of errors after updating version from 0.22.16 to 0.23.0 #133

Error while starting application every second start (errno 29)