immux / immux1

https://immux.com
0 stars 0 forks source link

Identical ID targets in InsertCommand result in duplicated journal entry #151

Open poppindouble opened 5 years ago

poppindouble commented 5 years ago

I found out this question when I was working on correctness of our benchmark system.

When we insert multi units, which all the units have the same id, in one batch, for example, here is my bench_spec:

bench_spec:

let bench_spec = ArtificialDataBenchSpec {
        name: "journal",
        unicus_port: 10099,
        main: &execute,
        actions: vec![Action::Insert {
            table_spec: (
                row_count,
                Box::new(|_row_number| -> UnitId { UnitId::new(1) }),
                vec![Box::new(
                    |row_number, _row_count| -> (PropertyName, UnitContent) {
                        let property_name = PropertyName::from(format!("{}", row_number).as_str());
                        let unit_content = UnitContent::String("1234".to_string());
                        return (property_name, unit_content);
                    },
                )],
            ),
            num_jsons_per_command: 1,
        }],
        verify_correctness,
        verification_fn: &verify_journal,
        report_period,
    };

Since we are inserting multi units with the same unit_id, so after the insertion, we use InspectCommand to query our db for verifying correctness, then we will get the following result:

("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"1\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"2\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"3\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"4\":\"1234\"}") })
("7", Unit { id: UnitId(1), content: JsonString("{\"5\":\"1234\"}") })
("8", Unit { id: UnitId(1), content: JsonString("{\"6\":\"1234\"}") })
("9", Unit { id: UnitId(1), content: JsonString("{\"7\":\"1234\"}") })
("10", Unit { id: UnitId(1), content: JsonString("{\"8\":\"1234\"}") })
("11", Unit { id: UnitId(1), content: JsonString("{\"9\":\"1234\"}") })
("12", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("13", Unit { id: UnitId(1), content: JsonString("{\"11\":\"1234\"}") })
("14", Unit { id: UnitId(1), content: JsonString("{\"12\":\"1234\"}") })
("15", Unit { id: UnitId(1), content: JsonString("{\"13\":\"1234\"}") })
("16", Unit { id: UnitId(1), content: JsonString("{\"14\":\"1234\"}") })
....

In the above output, each tuple represent (ChainHeight, UnitContent). The above result is expected.

However if I change the num_jsons_per_command from 1 to 10, I got the following result:

("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("4", Unit { id: UnitId(1), content: JsonString("{\"20\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("5", Unit { id: UnitId(1), content: JsonString("{\"30\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("6", Unit { id: UnitId(1), content: JsonString("{\"40\":\"1234\"}") })
("7", Unit { id: UnitId(1), content: JsonString("{\"50\":\"1234\"}") })
("7", Unit { id: UnitId(1), content: JsonString("{\"50\":\"1234\"}") })
("7", Unit { id: UnitId(1), content: JsonString("{\"50\":\"1234\"}") })
("7", Unit { id: UnitId(1), content: JsonString("{\"50\":\"1234\"}") })
....

The problem comes out how insert_executor and inspect_executor works.

In vkv:

               let next_height = self.increment_chain_height()?;
                match write_instruction {
                    DataWriteInstruction::SetMany(set_many) => {
                        for target in set_many.targets.iter() {
                            self.execute_versioned_set(&target.key, &target.value, next_height)?
                        }
                        let record: InstructionRecord = instruction.to_owned().into();
                        if let Err(_) = self.save_instruction_record(&next_height, &record) {
                            return Err(VkvError::SaveInstructionFail.into());
                        }
                        let count = set_many.targets.len();
                        return Ok(Answer::DataAccess(DataAnswer::Write(
                            DataWriteAnswer::SetOk(SetOkAnswer { count }),
                        )));
                    }
fn execute_versioned_set(
        &mut self,
        store_key: &StoreKey,
        value: &StoreValue,
        height: ChainHeight,
    ) -> ImmuxResult<()> {
        let journal: UnitJournal = match self.get_journal(store_key) {
            Err(_error) => UnitJournal {
                value: value.to_owned(),
                update_heights: HeightList::new(&[height]),
            },
            Ok(mut existing_journal) => {
                existing_journal.update_heights.push(height);
                existing_journal.value = value.to_owned();
                existing_journal
            }
        };
        self.set_journal(store_key, &journal)
    }

We increase the height for once, and we save the instruction. so if we insert multi units with same id in one batch, for example, in my bench_spec (num_jsons_per_command: 10), which leads to the result that for the same uinit_id, will have update_heights like this [2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5......]

Then when we inspect our journal, we have:

fn get_value_after_height_recurse(
        &self,
        key: &StoreKey,
        requested_height: &ChainHeight,
        recurse_time: u16,
    ) -> ImmuxResult<StoreValue> {
        if recurse_time > MAX_RECURSION {
            return Err(VkvError::TooManyRecursionInFindingValue.into());
        }
        match self.get_journal(key) {
            Err(error) => return Err(error),
            Ok(journal) => {
                let possible_heights: Vec<_> = journal
                    .update_heights
                    .iter()
                    .take_while(|h| h <= requested_height)
                    .collect();
                for height in possible_heights.into_iter().rev() {
                    let record = self.load_instruction_record(&height)?;
                    let instruction = &record.instruction;
                    match instruction {
                        Instruction::Data(DataInstruction::Write(write)) => match write {
                            DataWriteInstruction::SetMany(set_many) => {
                                for target in &set_many.targets {
                                    if target.key == *key {
                                        return Ok(target.value.clone());
                                    }
                                }
                            }
                            DataWriteInstruction::RevertMany(revert_many) => {
                                for target in &revert_many.targets {
                                    if target.key == *key {
                                        return Ok(self.get_value_after_height_recurse(
                                            key,
                                            &target.height,
                                            recurse_time + 1,
                                        )?);
                                    }
                                }
                                return Err(VkvError::CannotFindSuitableVersion.into());
                            }
                            DataWriteInstruction::RevertAll(revert_all) => {
                                return Ok(self.get_value_after_height_recurse(
                                    key,
                                    &revert_all.target_height,
                                    recurse_time + 1,
                                )?);
                            }
                        },
                        _ => return Err(VkvError::UnexpectedInstruction.into()),
                    }
                }
                return Err(VkvError::CannotFindSuitableVersion.into());
            }
        }
    }

We took out the correct instruction, however in the DataWriteInstruction::SetMany(set_many), once we meet the correct key, we return the target_value, that's why we end up the result like:

("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("2", Unit { id: UnitId(1), content: JsonString("{\"0\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
("3", Unit { id: UnitId(1), content: JsonString("{\"10\":\"1234\"}") })
....

Of course, if we insert multi units in one batch with different unit_id, or, we set num_jsons_per_command: 1, this won't be a problem. I am not sure this a really important things we need to fix for now, after all, batch insert is an internal api, but this behaviour of our db seems wired to me, because, we actually have all the history data, but when we do inspection, the result seems loosing some data.

In my opinion, I think num_jsons_per_command shouldn't have impact on inspection result.

blaesus commented 5 years ago

It's a bug in implementation of handling of DataWriteInstruction::SetMany in VKV. I'll fix this.