tendermint / tm-db

Common database interface for various database backends for Tendermint Core and Cosmos SDK
Apache License 2.0
89 stars 136 forks source link

feat(goleveldb,rocksdb): optimization: remove redundant check from iterator #250

Closed faddat closed 2 years ago

faddat commented 2 years ago

This is a cherry pick from terra's performance oriented branch.

It removes an unnecessary check when we use the iterator.

codecov[bot] commented 2 years ago

Codecov Report

Merging #250 (efeed00) into master (d1b9b74) will decrease coverage by 0.49%. The diff coverage is 100.00%.

:exclamation: Current head efeed00 differs from pull request most recent head e0ff0ec. Consider uploading reports for the commit e0ff0ec to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #250      +/-   ##
==========================================
- Coverage   68.54%   68.05%   -0.50%     
==========================================
  Files          27       27              
  Lines        2130     2116      -14     
==========================================
- Hits         1460     1440      -20     
- Misses        595      596       +1     
- Partials       75       80       +5     
Impacted Files Coverage Δ
boltdb.go 57.25% <ø> (ø)
boltdb_batch.go 82.69% <ø> (ø)
boltdb_iterator.go 89.47% <ø> (-2.64%) :arrow_down:
cleveldb.go 70.99% <ø> (ø)
cleveldb_batch.go 82.35% <ø> (ø)
cleveldb_iterator.go 85.89% <ø> (-2.57%) :arrow_down:
rocksdb.go 72.26% <ø> (ø)
rocksdb_batch.go 84.31% <ø> (ø)
goleveldb_iterator.go 80.26% <100.00%> (+3.15%) :arrow_up:
rocksdb_iterator.go 97.46% <100.00%> (+4.44%) :arrow_up:
... and 5 more
Impacted Files Coverage Δ
boltdb.go 57.25% <ø> (ø)
boltdb_batch.go 82.69% <ø> (ø)
boltdb_iterator.go 89.47% <ø> (-2.64%) :arrow_down:
cleveldb.go 70.99% <ø> (ø)
cleveldb_batch.go 82.35% <ø> (ø)
cleveldb_iterator.go 85.89% <ø> (-2.57%) :arrow_down:
rocksdb.go 72.26% <ø> (ø)
rocksdb_batch.go 84.31% <ø> (ø)
goleveldb_iterator.go 80.26% <100.00%> (+3.15%) :arrow_up:
rocksdb_iterator.go 97.46% <100.00%> (+4.44%) :arrow_up:
... and 5 more
faddat commented 2 years ago

@creachadair I really couldn't say for sure if this one is safe.

08d2 commented 2 years ago

It removes an unnecessary check when we use the iterator.

How do you know it is unnecessary?

faddat commented 2 years ago

@08d2 I don't for sure, which is why I said


@creachadair I really couldn't say for sure if this one is safe.
creachadair commented 2 years ago

@creachadair I really couldn't say for sure if this one is safe.

If you're not sure, I'm not sure how I'm supposed to be 😀

08d2 commented 2 years ago

@08d2 I don't for sure, which is why I said


@creachadair I really couldn't say for sure if this one is safe.

Then I suppose this PR should be closed, no?

github-actions[bot] commented 2 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

faddat commented 2 years ago

None of us should be sure.

My preference: I change this to a draft, and figure out if it's safe.

faddat commented 2 years ago

@creachadair on this, I'd kindly request aid in figuring out how to make the certain determination that this code is safe. It's part of the omnibus upgrade branch. My comments about safety are driven by disabling these tests:

the tests that you can see in common_test.go

08d2 commented 2 years ago

@faddat You have removed a fair amount of code which verifies certain invariants during tests. If that code was added again, would the corresponding tests fail? If they would fail, can you explain why those failures are safe to ignore?

faddat commented 2 years ago

Hi @08d2 -- I have mainly experiential info on this. Basically, those iterator checks that are removed here, are what are required to pass the the tests that I removed here.

"it works".

Now, if you're able to help come up with some new tests, that would be super useful. For both this and #237 -- the reality is that I'm very literally just doing what I can observe to work well.

I would like to improve testing, and restructure this repo, and ensure that we remove badgerdb from future timelines for both the cosmos-sdk and tendermint -- or have very good reason for using it.

08d2 commented 2 years ago

But why did you remove those tests?

faddat commented 2 years ago

ah-- look at the corresponding code

08d2 commented 2 years ago

Looking at the code explains the "what" but not the "why" — can you ELI5? 😇

faddat commented 2 years ago

sure. The tests were dependent on the "redundant check from iterator".

faddat commented 2 years ago

https://github.com/tendermint/tm-db/pull/250/files#diff-61cc79a7f545ad48e7bce30b84f13aa4cfd4250b9898f0f6a18ccb38984acf10

See checknextpanics?

That would return the specific panic that the tests were looking for.

Part of the reason that this was converted to a draft is that I reckon that different tests could be added.

TBH, the larger issue right now is that the master branch is failing tests because... reasons.

Need to solve that, before other things are relevant imo.

08d2 commented 2 years ago

I'm sorry, I don't follow. I'd appreciate it if you could take a few minutes to write out exactly what this PR is doing, and why you believe it is (or could be) safe. Connect the dots for me :) because at the moment, it's not obvious to me why any of these changes have been made, nor that any of them are actually safe.

faddat commented 2 years ago

I mentioned above, I am not fully sure it's safe, and that's part of why it's on draft.

Will work on the fact that main fails tests, then come back.

This is much more serious:

faddat commented 2 years ago

-- anyhow @08d2 -- I'd prefer to keep this as a draft, walk you through reasoning too.

This PR, which is also contained in #237 -- is based on some work from Terra on performance, you can find it at their tm-db fork:

https://github.com/terra-money/tm-db

Unfortunately, they weren't super insistent on the tests passing there.

1) it's possible that this PR is entirely in error eg: maybe their code wasn't safe, either.
2) Seeking help to either:

@08d2 -- clear enough for ya?

Thanks!

PS: I'm running #237 in prod. If you see reason why this is bad (#250) please don't hesitate to lmk your thoughts. It is an easy reversion.

further related matter

I am having an awful time reproducing the performance of #237 as synthetic benchmarks. I know it is a lot faster, cause I'm using it to serve >1b rpc queries /mo.

If you or anyone else has thoughts on how to improve the synthetic benchmarks, I'm all ears.

faddat commented 2 years ago

further-further related matter

On 4th or 5th consideration, I am going to close this PR, and revert.

I can't prove safety on it.

I'm also going to open an issue that explores this but @creachadair and @08d2 are basically right here, there's too little proof behind these changes.