quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.15k stars 672 forks source link

- Knock-knock? - Who's there? - Broken segment! #973

Closed ppodolsky closed 3 years ago

ppodolsky commented 3 years ago

Describe the bug The same load profile as in #969 - deletions, addings and mergings. Now it happens on querying after several hours of serving. I think the reason is basically the same. At startup and during several hours afterwards all queries were ok but after generations of merges searcher.doc started to throw VInt decoding error.

Which version of tantivy are you using? https://github.com/tantivy-search/tantivy/commit/bf6e6e8a7cc1826212ba2500b08ecb53dfcdeda1

To Reproduce Sent broken segment to you in gitter.

ppodolsky commented 3 years ago

Failing at https://github.com/tantivy-search/tantivy/blob/main/src/store/reader.rs#L104 chechpoint (doc=[14958..16689), bytes=[3471326..3478611)), doc_id - 15086

fulmicoton commented 3 years ago

Thanks I'll investigate on Monday

Le sam. 9 janv. 2021 à 03:23, Pasha Podolsky notifications@github.com a écrit :

Chechpoint with broken VInt (doc=[14958..16689), bytes=[3471326..3478611)), docID - 15086

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tantivy-search/tantivy/issues/973#issuecomment-756921355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZMQSW53U62X4XI7WJZCTSY5ETRANCNFSM4V2YURMA .

fulmicoton commented 3 years ago
[14892..14909), bytes=[3453499..3456556))
(doc=[14909..14926), bytes=[3456556..3460848))
(doc=[14926..14942), bytes=[3460848..3464857))
(doc=[14942..14958), bytes=[3464857..3471326))
(doc=[14958..16689), bytes=[3471326..3478611)) <---
(doc=[16689..16724), bytes=[3478611..3486278)) 
(doc=[16724..16753), bytes=[3486278..3493484))
(doc=[16753..16787), bytes=[3493484..3500905))
(doc=[15087..15131), bytes=[3500905..3508456)) <---
(doc=[15131..15165), bytes=[3508456..3516084))
(doc=[15165..15196), bytes=[3516084..3523442))
(doc=[15196..15228), bytes=[3523442..3530761))
(doc=[15228..15256), bytes=[3530761..3538043))

The bug looks very similar.

fulmicoton commented 3 years ago

Did you enable logging (warn level should be sufficient) and did you see a lot of merge fail before that?

I'd like to know if the assert in block.rs:l.47 triggered several times before you encounterred your problem.

ppodolsky commented 3 years ago

I will check it today (around 10-12UTC) after getting to laptop.

On 9 Jan 2021, at 03:14, Paul Masurel notifications@github.com wrote:

 Did you enable logging and did you see a lot of merge fail before that?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ppodolsky commented 3 years ago

What I can say now that this segment is definitely came from merging. It is too large to come from a single interval of writing.

On 9 Jan 2021, at 03:14, Paul Masurel notifications@github.com wrote:

 Did you enable logging and did you see a lot of merge fail before that?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fulmicoton commented 3 years ago

@ppodolsky just to be sure, this is a brand new index.. meaning it did not contain segment that would have been corrupted previously?

ppodolsky commented 3 years ago

@ppodolsky just to be sure, this is a brand new index.. meaning it did not contain segment that would have been corrupted previously?

Yep. I have rebuilt the whole index after applying latest commits from your main branch. I will recheck everything today and launch writings with enabled logging if it is required. Looks like I will be able to reproduce the issue quickly.

fulmicoton commented 3 years ago

Can you run your program with the following rev? acfb057462422db52f7800e954a5df2fceaf735a

It checks the doc store skip index while it is being written. If there is a problem, it detects it and return an error. tantivy then abruptly quit the process and logs the segments that were being merged.

The segment files are not removed so if you send them to me, I should be able to look at the issue. (the .store files are sufficient I think)

@ppodolsky

ppodolsky commented 3 years ago

Sure, I will release this rev today. During last weekend nothing happened (but write load was lesser than usual). I continue to observe and write logs. Will keep you informed.

fulmicoton commented 3 years ago

Thank you!

ppodolsky commented 3 years ago

Still having no luck in the catch. I've begun to doubt in sanity of what was there, probably I or k8s had managed to launch previous version of Tantivy for a moment and it'd corrputed segment.

To excuse I'd like to say that during 3 days under rather heavy load there is not any corruption. I'm keeping watching with logging til the end of week and then will close the issue if won't find anything. Highly likely everything is OK and I've false-alarmed, sorry.

fulmicoton commented 3 years ago

No worries! You have accumulated enough good Kharma by finding and spending time reporting the bug not to worry about that :)

ppodolsky commented 3 years ago

Didn't get the corruption, so it was definitely my mistake. Under two weeks of various load profiles there have been no any signs of broken segs. Thank you for being patient :)

fulmicoton commented 3 years ago

Thanks for the update!