quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.12k stars 672 forks source link

Case of corrupted segment #969

Closed ppodolsky closed 3 years ago

ppodolsky commented 3 years ago

Probably related to #897

Describe the bug In continuation of https://gitter.im/tantivy-search/tantivy?at=5ff589c503529b296bd23728 Firstly, I found memory bloating of Tantivy application. After debugging I found that merging thread have been failing every time on merging broken store in a particular segment. Example of the whole segment I've sent to your personally in Gitter.

Which version of tantivy are you using? https://github.com/tantivy-search/tantivy/commit/a4f33d3823f1bad3ff7a59877f1608615acabe6e

What happened I used poor man debugging and launched Tantivy with patched function (added print only):

    fn write_storable_fields(&self, store_writer: &mut StoreWriter) -> crate::Result<()> {
        for reader in &self.readers {
            let store_reader = reader.get_store_reader()?;
            if reader.num_deleted_docs() > 0 {
                for doc_id in reader.doc_ids_alive() {
                    let doc = store_reader.get(doc_id);
                    if let Err(ref err) = doc {
                        println!("Error: {:?}\nSegment ID: {:?}\nDocID: {}", err, reader.segment_id(), doc_id);
                    }
                    store_writer.store(&doc?)?;
                }
            } else {
                store_writer.stack(&store_reader)?;
            }
        }
        Ok(())
    }

Stdout after failed merging:

Error: IOError(Custom { kind: InvalidData, error: "Reach end of buffer while reading VInt" })
Segment ID: Seg("e6ece22e")
DocID: 53
ppodolsky commented 3 years ago

My 50 cent in research: I have pure Python library for reading Tantivy store files: https://github.com/izihawa/tantipy I've updated it yesterday to make it able to read new store format with skip_index. Now this library takes last layer in skip_index, takes compressed blocks offsets and reads&unpacks them.

So, I took the library and opened the .store file. File has been opened in a normal way. So, last layer in skip index and all VInts required to decode it are normal. Hope you could help further and tell whats wrong.

TantiPy snippet:

import yaml
from tantipy import TantivyReader

store = open('e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store', 'rb').read()
with open('schema.yaml') as file:
    coder = tantipy.TantivyCoder(yaml.safe_load(file.read()))
reader = TantivyReader(store, coder)
len(list(reader.documents()))

59

Update: attempt to .get last documents from this segment store leads to VInt decoding error. It looks that skip_index is broken in some way.

fulmicoton commented 3 years ago

ok thanks... I'll have a look at that today! this is very helpful.

fulmicoton commented 3 years ago

The checkpoints seem broken, ( even those of the last layer)...

[src/store/index/skip_index.rs:80] checkpoint = (doc=[0..38), bytes=[0..6993))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[38..46), bytes=[6993..8399))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[46..61), bytes=[8399..10550))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[61..62), bytes=[10550..10749))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[62..63), bytes=[10749..11151))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[63..64), bytes=[11151..11341))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[64..65), bytes=[11341..11532))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[65..66), bytes=[11532..11874))
[src/store/index/skip_index.rs:80] checkpoint = (doc=[58..59), bytes=[11874..12084))

I suspect a bug in the merge logic when there are deletes.

fulmicoton commented 3 years ago

I added a functional test stress tesiting the store but I haven't been able to reproduce so far. Can you confirm that you did not build your index with a modified version of tantivy?

ppodolsky commented 3 years ago

Probably it could help to investigate the crime scene, here is an order of my actions.

  1. Preparing index. 8 writer threads, 8GB memory, no querying, no merging, adding documents without deletions, commit once only in the end.
  2. Starting serving. Enabled merging, 4 writer threads, 1GB memory, querying, upserting documents (delete and add afterward), commit every 2 minute or on demand.

At the first step I've got 16-24 (edited: I updated exact number below) equally sized segments with all data. When I've started serving, merger had decided to merge all these large segments (it could explain initial memory bloating? memory is bloating due to failing merging, confirmed by file timestamps below). New documents had been upserting at the same time and these upsertings definetly overwritten those documents that had been under merging. And I suspect that broken segment came at this moment.

Can you confirm that you did not build your index with a modified version of tantivy?

The only difference now is the following (it is in my fork):

pasha@Pashas-MacBook-Air tantivy % git diff upstream/main origin/master
diff --git a/src/lib.rs b/src/lib.rs
index 33baf80d7..e480b3b86 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -157,7 +157,7 @@ pub use self::snippet::{Snippet, SnippetGenerator};

 mod docset;
 pub use self::docset::{DocSet, TERMINATED};
-pub use crate::common::HasLen;
+pub use crate::common::{BinarySerializable, HasLen};
 pub use crate::common::{f64_to_u64, i64_to_u64, u64_to_f64, u64_to_i64};
 pub use crate::core::{Executor, SegmentComponent};
 pub use crate::core::{Index, IndexMeta, Searcher, Segment, SegmentId, SegmentMeta};
fulmicoton commented 3 years ago

Do you have an idea of how long it took to appear? Can I send you a PR to try to detect it at the right moment if you eventually reproduce?

fulmicoton commented 3 years ago

And I suspect that broken segment came at this moment.

This segment is pretty small. It is very unlikely that it appeared during the initial large merge.

ppodolsky commented 3 years ago

And I suspect that broken segment came at this moment.

This segment is pretty small. It is very unlikely that it appeared during the initial large merge.

I meant this segment was a result of upsertings of new documents while large segments has been merging.

You know, I have CPU graphs and segment creation times, let me see more details. Probably I will find something.

fulmicoton commented 3 years ago

I meant this segment was a result of upsertings of new documents while large segments has been merging.

Apart from defective RAM it should not matter though. All clues I have so far would point to a logic error in tantivy in the way we merge store... especially the stacking but I cannot find the problem yet.

ppodolsky commented 3 years ago

Here some big bash output and probably unrelated highlights, not sure what can be useful but let it be:

Initial index state, it is splitted into two files store.tar (*.store files) and index.tar (all others)

(venv) pasha@host:~$ tar -tvf dump/store.tar | head
-rw-rw-r-- 1000/1000 414026422 2021-01-05 20:36 181f462a1dca4bc6b3e923dc38946afc.store
-rw-rw-r-- 1000/1000 433821910 2021-01-05 22:47 edf8a3aa77704c708a82ab17a10c0742.store
-rw-rw-r-- 1000/1000 405652611 2021-01-06 02:47 f3e0c3fa55e447bc9a56cedb9a91a379.store
-rw-rw-r-- 1000/1000 428915994 2021-01-05 21:24 cb1e90f9652a415fb767712fa9f7825b.store
-rw-rw-r-- 1000/1000 406850842 2021-01-06 01:59 42368d2b647242a1976e0ca8ae8ce6f9.store
-rw-rw-r-- 1000/1000 425311840 2021-01-05 23:21 4ece9c55151a418280aa7115ed403cf4.store
-rw-rw-r-- 1000/1000 429223477 2021-01-05 21:24 ee140cf366274069b9848a2ba5777eb2.store
-rw-rw-r-- 1000/1000 406520875 2021-01-06 01:58 d9cde8c9e643407282c3b086d19dda7c.store
-rw-rw-r-- 1000/1000 432718035 2021-01-05 22:18 e2e3d29374504338be5945b736f142ce.store
-rw-rw-r-- 1000/1000 414170229 2021-01-06 02:47 70f9e168447c4a05a009e3aee387c763.store
(venv) pasha@host:~$ tar -tvf dump/store.tar | wc -l
66
(venv) pasha@host:~$ tar -tvf dump/index.tar | head
-rw-rw-r-- 1000/1000    579024 2021-01-05 22:18 d7cd81b7ed724fab8ba7e5d830bee126.posidx
-rw-rw-r-- 1000/1000  24184884 2021-01-05 21:24 cb1e90f9652a415fb767712fa9f7825b.fieldnorm
-rw-rw-r-- 1000/1000 169514364 2021-01-05 22:18 c9de1254ccbe4888a07e9a0f117541a4.idx
-rw-rw-r-- 1000/1000 157613031 2021-01-05 23:20 386712fb611c450bb7e2fefc0afa3761.idx
-rw-rw-r-- 1000/1000  71887791 2021-01-05 23:21 4ece9c55151a418280aa7115ed403cf4.pos
-rw-rw-r-- 1000/1000 160869272 2021-01-05 19:33 7885d2b0bb7b4bd78b2085e43515965f.idx
-rw-rw-r-- 1000/1000   6189987 2021-01-06 01:58 0e492b6f62934fdb907471b953f0d825.fast
-rw-rw-r-- 1000/1000  27202835 2021-01-05 22:18 d7cd81b7ed724fab8ba7e5d830bee126.fieldnorm
-rw-rw-r-- 1000/1000    601296 2021-01-05 23:20 386712fb611c450bb7e2fefc0afa3761.posidx
-rw-rw-r-- 1000/1000 169509571 2021-01-05 22:18 076e754dae714274a871a9649cc9fd9a.idx
(venv) pasha@host:~$ tar -tvf dump/index.tar | wc -l
384

Current biggest files in index (atm merging and upserting disabled)

(venv) pasha@host:~$ ls -lS ...index| head -n 30
total 44384984
-rw-r--r-- 1 root root 26874313764 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.store
-rw-r--r-- 1 root root 10255159788 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.idx
-rw-r--r-- 1 root root  4641827492 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.pos
-rw-r--r-- 1 root root  1516353764 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.term
-rw-r--r-- 1 root root  1488021107 Jan  6 06:49 674af0595b314b84965eb0a7b742869f.fieldnorm
-rw-r--r-- 1 root root   465006709 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.fast
-rw-r--r-- 1 root root    38486536 Jan  6 07:21 674af0595b314b84965eb0a7b742869f.posidx
-rw-r--r-- 1 root root    15500382 Jan  6 15:16 674af0595b314b84965eb0a7b742869f.124037308.del
-rw------- 1 root root      648549 Jan  7 06:00 meta.json
-rw-r--r-- 1 root root      100867 Jan  6 15:08 0bf1f1e1a9d748028c1a0996e5e48f94.term
-rw-r--r-- 1 root root       47941 Jan  6 15:08 0bf1f1e1a9d748028c1a0996e5e48f94.store
-rw-r--r-- 1 root root       29749 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.term
-rw-r--r-- 1 root root       29118 Jan  6 09:08 667290ba18bc42eebb65a52a27361ee9.term
-rw-r--r-- 1 root root       28073 Jan  6 09:42 386e3a4734ca47b29dbe884c2f03f151.term
-rw-r--r-- 1 root root       20837 Jan  6 09:08 12b7f503172a4f399e0870e0f752c3f5.term
-rw-r--r-- 1 root root       19809 Jan  6 09:38 fa709e13f63643e5a65bd5b1dfcb3260.term
-rw-r--r-- 1 root root       18426 Jan  6 09:10 33869f79d25e45ae8c38979336c05c07.term
-rw-r--r-- 1 root root       17934 Jan  6 15:08 0bf1f1e1a9d748028c1a0996e5e48f94.idx
-rw-r--r-- 1 root root       17759 Jan  6 09:07 7df9ddfa8e8c45d2b7e3ea6ff335a954.term
-rw-r--r-- 1 root root       17637 Jan  6 09:31 eb4fce9772f64e2db04e63891b49641f.term
-rw-r--r-- 1 root root       15894 Jan  6 09:13 4dfb5a6890104ae98b82fe94717a8686.term
-rw-r--r-- 1 root root       15073 Jan  6 09:31 9cc3fc25ed954e359d051d0307a4dca0.term
-rw-r--r-- 1 root root       14850 Jan  6 09:04 94a28fd8f9b94bd096ecca7f3968c1a4.term
-rw-r--r-- 1 root root       14708 Jan  6 09:41 46657e98ed934e71abf1ad7d4bcaf1b1.term
-rw-r--r-- 1 root root       14543 Jan  6 09:41 0dd7f11064b94b489d2b1f87ffdf3c97.term
-rw-r--r-- 1 root root       14273 Jan  6 09:33 5b2391b28ea4423a8abeed3b93b150c2.term
-rw-r--r-- 1 root root       14132 Jan  6 09:28 e7f6dc9cce474437845db7bbb4fd3060.term
-rw-r--r-- 1 root root       13934 Jan  6 09:10 381ec8f9ce5244d7a10044784fc011f9.term
-rw-r--r-- 1 root root       13880 Jan  6 09:41 91279dc2077a4b398b3cbdb2f4efc9e2.term

Broken segment times:

(venv) pasha@host:~$ ls -l index grep e6ece22e5bca4e0dbe7ce3e4dcbd5bbf
-rw-r--r-- 1 root root         173 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.124002266.del
-rw-r--r-- 1 root root         421 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.fast
-rw-r--r-- 1 root root         950 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.fieldnorm
-rw-r--r-- 1 root root        4107 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.idx
-rw-r--r-- 1 root root        1717 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.pos
-rw-r--r-- 1 root root         277 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.posidx
-rw-r--r-- 1 root root       12301 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store
-rw-r--r-- 1 root root       29749 Jan  6 09:01 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.term

Memory bloating began right after 09:01, so it could be confirmed that it linked with failing merging.

I will add here something else if would find.

ppodolsky commented 3 years ago

Do you have an idea of how long it took to appear? Can I send you a PR to try to detect it at the right moment if you eventually reproduce?

It appeared rather fast. Between minutes and 1-2 hours while having the rate of 100-1000 writings per minute. I have initial dump and could start everything from scratch. I could apply any PR you will send but the issue looks probabilistic and is not guaranteed to be reproduced.

ppodolsky commented 3 years ago

I meant this segment was a result of upsertings of new documents while large segments has been merging.

Apart from defective RAM it should not matter though. All clues I have so far would point to a logic error in tantivy in the way we merge store... especially the stacking but I cannot find the problem yet.

Yes, you are right. Checked by timestamps, merging of initial large segments has ended before I started upserting new records.

ppodolsky commented 3 years ago

And finally found a proper graph, the broken segment has been produced at 09:01 (can't be more precise cause its filetime I can) and I've enabled upserting which produced this segment right at 09:01:00-09:01:15. Not sure I have any more interesting graphs.

Updated: found how to show full fime for files.

(venv) pasha@host:~$ ls -la --time-style=full-iso ./index grep e6ece22e5bca4e0dbe7ce3e4dcbd5bbf
-rw-r--r-- 1 root  root          173 2021-01-06 09:01:16.493195352 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.124002266.del
-rw-r--r-- 1 root  root          421 2021-01-06 09:01:13.941154323 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.fast
-rw-r--r-- 1 root  root          950 2021-01-06 09:01:13.865153102 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.fieldnorm
-rw-r--r-- 1 root  root         4107 2021-01-06 09:01:13.945154388 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.idx
-rw-r--r-- 1 root  root         1717 2021-01-06 09:01:13.949154453 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.pos
-rw-r--r-- 1 root  root          277 2021-01-06 09:01:13.949154453 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.posidx
-rw-r--r-- 1 root  root        12301 2021-01-06 09:01:13.953154516 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store
-rw-r--r-- 1 root  root        29749 2021-01-06 09:01:13.945154388 +0000 e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.term
fulmicoton commented 3 years ago

Only a single bit seems corrupted... The md5 in the file footer is correct. I am started to suspect your RAM to be defectuous. Can you do a RAM check?

Here is a file to patch your file.

content = open("e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store", "rb").read()
offset = 12103
patched_content = content[:offset] + bytes([content[offset] ^ (1 << 3)])  + content[offset+1:]
open("e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store.patched", "wb").write(patched_content)
ppodolsky commented 3 years ago

Only a single bit seems corrupted... The md5 in the file footer is correct. I am started to suspect your RAM to be defectuous. Can you do a RAM check?

Here is a file to patch your file.

content = open("e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store", "rb").read()
offset = 12103
patched_content = content[:offset] + bytes([content[offset] ^ (1 << 3)])  + content[offset+1:]
open("e6ece22e5bca4e0dbe7ce3e4dcbd5bbf.store.patched", "wb").write(patched_content)

It would be epic, it is Samsung with ECC, lol Sure, I will check. But need time to offload this server.

fulmicoton commented 3 years ago

If you have ECC that is quite unlikely... :-S

fulmicoton commented 3 years ago

Can you confirm the patch work for you?

ppodolsky commented 3 years ago

I can confirm that patch healed this segment but it looks that merging has failed again. I relaunched server with enabled debugging print to find out where it is failing now. Will post here results in a while.

fulmicoton commented 3 years ago

putting on sherlock hat

ppodolsky commented 3 years ago

putting on sherlock hat

evading hostile space rays

ppodolsky commented 3 years ago

Sent you next broken segment. Could you apply your inverted bit detection magic there?

fulmicoton commented 3 years ago

Ok the error is super similar but not one bit... So the one bit thing was just a coincidnece.

ppodolsky commented 3 years ago

canceling purchase of space radiation shield for server

fulmicoton commented 3 years ago

Logging for myself or future generation. Checkpoint are delta encoded but in packs of eight. Here in both case, the first checkpoint's start document of the second block is off by 6 (8) for the previous faulty segment, hence shifting all of the subsequent checkpoints.

fulmicoton commented 3 years ago

you don't have logs by any chance? With RUST_LOG=info, tantivy logs which merges occur. I would like to confirm these two segments have been produced by a merge.

ppodolsky commented 3 years ago

No. I know I should have done it while releasing from master branch but at that time I've decided to do a brave move:) Nevertheless I can enable it and start from the beginning but it will take a time.

ppodolsky commented 3 years ago

you don't have logs by any chance? With RUST_LOG=info, tantivy logs which merges occur. I would like to confirm these two segments have been produced by a merge.

A side note imo useful for future debugging - here in Tantivy you could use some purposeful algorithm for generating random names for segments. Something like Twitter ULIDs that include timestamps and clues pointing to origin of segments.

fulmicoton commented 3 years ago

that is a good idea. can you open a ticket?

fulmicoton commented 3 years ago

I thnk I found the bug!I thnk I found the bug!

fulmicoton commented 3 years ago

to reproduce, a merge needs to occur with:

In that case first_doc_in_block is not updated, which creates the bug.

ppodolsky commented 3 years ago

Can't wait to try it!

fulmicoton commented 3 years ago

@ppodolsky Thank you very for the patience and all of your help reporting the bug!

ppodolsky commented 3 years ago

It was fun, thank you too!