pjtatlow / jammdb

Just Another Memory Mapped Database
Apache License 2.0
269 stars 20 forks source link

panicked at 'assertion failed: self.meta.root_page == page_id || self.page_parents.contains_key(&page_id)' #11

Closed drauschenbach closed 1 year ago

drauschenbach commented 3 years ago

May 04 17:38:59 c003-n3 on-prem-agent[25208]: thread 'tokio-runtime-worker' panicked at 'assertion failed: self.meta.root_page == page_id || self.page_parents.contains_key(&page_id)', /home/REDACTED/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:723:17

pjtatlow commented 3 years ago

Can you give me any more context to reproduce this reliably?

drauschenbach commented 3 years ago

I'll work on that. But this was all I currently have from my systemd journal where I didn't have extra stack trace info turned on.

drauschenbach commented 3 years ago

OK it looks like this error occurs on every write attempt now, so I was able to capture the extra backtrace info.

May 04 18:12:31 c003-n3 on-prem-agent[3249]: thread 'main' panicked at 'assertion failed: self.meta.root_page == page_id || self.page_parents.contains_key(&page_id)', /home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:723:17
May 04 18:12:31 c003-n3 on-prem-agent[3249]: stack backtrace:
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    0: rust_begin_unwind
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    1: core::panicking::panic_fmt
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    2: core::panicking::panic
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:50:5
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    3: jammdb::bucket::BucketInner::node
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:723:17
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    4: jammdb::node::Node::merge
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/node.rs:276:39
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    5: jammdb::bucket::BucketInner::rebalance
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:761:41
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    6: jammdb::bucket::BucketInner::rebalance
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:748:35
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    7: jammdb::bucket::Bucket::rebalance
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:354:9
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    8: jammdb::transaction::TransactionInner::rebalance
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/transaction.rs:296:9
May 04 18:12:31 c003-n3 on-prem-agent[3249]:    9: jammdb::transaction::Transaction::commit
May 04 18:12:31 c003-n3 on-prem-agent[3249]:              at ./home/davidr/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/transaction.rs:166:9
May 04 18:12:31 c003-n3 on-prem-agent[3249]:   10: <on_prem_agent::configdb::jamm::ConfigDAOImpl as on_prem_agent::configdb::ConfigDAO>::store_config
...

And my calling code is:

    fn store_config(&self, config: &Config) -> Result<(), Error> {
        trace!("store_config({})", config.id);
        let tx = self.db.tx(true)?;
        let bucket = match tx.get_bucket(BUCKET_CONFIGS_BY_ID) {
            Err(JammdbError::BucketMissing) => tx.create_bucket(BUCKET_CONFIGS_BY_ID)?,
            Err(e) => return Err(Error::from(e)),
            Ok(b) => b,
        };
        bucket.put(&config.id, Bytes::from(serialize_config(&config)))?;
        tx.commit()?;
        Ok(())
pjtatlow commented 3 years ago

Hmm, if it's happening on every write now the database file probably got into a bad state somehow. Two things that would be helpful if possible:

  1. The corrupt database file itself, if you don't have anything too sensitive in there or something

  2. If you know what you did up until the database got into this bad state so I could reproduce the issue from scratch.

drauschenbach commented 3 years ago

There's nothing confidential within my DB, so you're welcome to have a copy. This is from a Raspberry Pi so the architecture was armhf, in case you need to know that to comprehend byte order.

agent-config.db.zip

I'll need some time but I'll try to dig through the log history to find the initial failure event.

drauschenbach commented 3 years ago

As for what I was doing when the database became corrupt, a code review reveals that the code shown above is the only thing going on on a regular basis (besides read-only ops which I'm assuming are irrelevant).

brandonhamilton commented 2 years ago

I am also having this problem in my application.

Unfortunately I am unable to share my database at the point of the issue, but it occurs consistently with a particular sequence of writes.

My code looks like this:

fn put(&mut self, key: u64, value: &[u8]) -> std::result::Result<(), Error> {
    let tx_result = (|| {
        let tx = self.db.tx(true)?;
        let bucket = tx.get_or_create_bucket("cache")?;
        let r = bucket.put(key.to_be_bytes(), value).map(|_| ());
        tx.commit()?;
        r
    })();
    // ... other code
    tx_result
}
pjtatlow commented 2 years ago

Hey @brandonhamilton, I know there is a race condition somewhere in here but I'm not able to reproduce it consistently so I'm having a hard time finding it. Do you have steps that consistently reproduce the error?

ArunGust commented 2 years ago

Same issue for me too

thread 'actix-rt|system:0|arbiter:0' panicked at 'assertion failed: self.meta.root_page == page_id || self.page_parents.contains_key(&page_id)', /home/admins/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.5.0/src/bucket.rs:723:17


cargo 1.61.0-nightly (109bfbd 2022-03-17) release: 1.61.0-nightly host: x86_64-unknown-linux-gnu os: Ubuntu 20.04 (focal) [64-bit] ---------Code

match jdb.tx(true) {
    Err(err) => Ok(HttpResponse::BadGateway().body(err.to_string())),
    Ok(tx) => {
        match tx.get_bucket("jsmaster") {
            Err(err) => Ok(HttpResponse::BadGateway().body(err.to_string())),
            Ok(pages) => {
                match pages.put(id.clone().as_bytes(),json::stringify(parsed)){
                    Err(err) => Ok(HttpResponse::BadGateway().body(err.to_string())),
                    Ok(_) => {
                        match tx.commit() {
                            Err(err) => Ok(HttpResponse::BadGateway().body(err.to_string())),
                            Ok(_) => Ok(HttpResponse::Ok().body(id))
                        }
                    }
                }
            }
        }
    }
}
abc-mikey commented 1 year ago

I'm also reproducing this with 100% reproducibility and a fresh file each time.

pjtatlow commented 1 year ago

@abc-mikey Can you share your code and your system details?

pjtatlow commented 1 year ago

I believe 0.8.0 should resolve this issue, feel free to open another one if you run into other problems!