solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.02k stars 4.19k forks source link

Sporadic snapshot verification error #7244

Closed ryoqun closed 4 years ago

ryoqun commented 4 years ago

Problem

I just observed a snapshot error. Details are to be investigated.

ryoqun@ubuqun:~/work/solana/solana$ solana-ledger-tool --version
solana-ledger-tool 0.21.1 [channel=on-ryoqun-fork commit=f5fad5b43d7287ef146477c7625d361d78cd5bcc]
ryoqun@ubuqun:~/work/solana/solana$ solana-ledger-tool --ledger bad-snapshot/ verify
[2019-12-04T03:12:06.315256104Z INFO  solana_ledger::blocktree] Maximum open file descriptors: 65000
[2019-12-04T03:12:06.399748105Z INFO  solana_ledger::blocktree] "bad-snapshot/rocksdb" open took 84ms
Verifying ledger...
[2019-12-04T03:12:06.399904928Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "bad-snapshot/snapshot"
[2019-12-04T03:12:06.400344352Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "bad-snapshot/snapshot.tar.bz2"
[2019-12-04T03:12:07.049287349Z INFO  solana_ledger::snapshot_utils] snapshot untar took 648ms
[2019-12-04T03:12:07.049730855Z INFO  solana_ledger::snapshot_utils] Loading from "/home/ryoqun/work/solana/solana/bad-snapshot/snapshot/.tmpqzVUmX/snapshots/275734/275734"
[2019-12-04T03:12:11.571189724Z INFO  solana_runtime::accounts_db] accounts_db: total_stores: 124 newest_slot: 275734 oldest_slot: 0 max_slot: 0 (num=7) min_slot: 275726 (num=1)
[2019-12-04T03:12:11.571335576Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2019-12-04T03:12:11.571515167Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=124i
thread 'main' panicked at 'Snapshot bank failed to verify', ledger/src/snapshot_utils.rs:228:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

bad-snapshot.zip zipped because github doesn't natively support *.bz2 files.

ryoqun commented 4 years ago

After long investigation, I've finally tracked the internal hash difference (which causes verification error) down to this tiny difference:

I don't know the exact cause however these differences happens spontaneously across all validator nodes. So some logic is wrong. And this precludes random race condition.

What I did: I modified validator to call verify_internal() when creating a snapshot. So this is diff between verification result when creating snapshot and ledger-tool verify's result: So this means mismatch exists when validator creates snapshot (solana-validator thinks it created a correct snapshot with the ad-hoc verification enabled) however it actually didn't.

ubuntu@ip-10-163-204-165:~/solana$ git diff --no-index --patch-with-stat good-494547 bad-494547
 good-494547 => bad-494547 | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/good-494547 b/bad-494547
index 8759863..05740cb 100644
--- a/good-494547 (When running validator created snapshot)
+++ b/bad-494547 (When starting validator ingests snapshot)
@@ -137,7 +137,7 @@ xoring..BankHash bad32fb96cf0d201abae20bfd324f2434819657d22a8ed582d1aa60600621c2
 xoring..BankHash a82992172dd45a981d5d52c0bc21056ac5fc4b216dff3e7c5e472c84dfc30f71 key: 2nWwBTzPKHkGoTUhAePJYVuoYkSi3rBuWfjwLgtTVAab slot: 0 account: Account { lamports: 33333333333334 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: CKSFzhf5mvqXwo7ft9VNcLzKcAHK7hWTwm1YQWDXRSA4 }
 xoring..BankHash df3fad672a5ffa772d068f0d86bc544f00d899ce4cf7824a0d54966826bffbfc key: 2o4CYMae9midVHNp3EeDibDBytYBjESokDHYkWT4bjaa slot: 0 account: Account { lamports: 18446744073709551615 data.len: 4 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 03000000 hash: G2UEXQaGHfCxUMwMSWtehdPYjAvvHWCoNWNvwcFRjoc7 }
 xoring..BankHash b55c31e74534a43c7df7bc7d2c54d4ecba5fab50245c8b31624f3014705fc8d9 key: 2odNUzWEeGfwnt1Lj27EV4EJmuZ5McqdAgnN9uiULpVz slot: 0 account: Account { lamports: 10101010000001 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: DCxLCMpJEpfHqqr36exGCaVPPDhmPQnrGHeRWVbPySZv }
-xoring..BankHash fe93d2fbaeb5e491d37db4e1dd860e77aba7129dbd1776eb1ff4f90f907e2fbf key: 2p3CLZ3dXB3pCwf2wPWeQCFNUyQUtdY85TbemRzqUAhP slot: 494530 account: Account { lamports: 1 data.len: 2054 owner: Vote111111111111111111111111111111111111111 executable: false rent_epoch: 127 data: 08eaac7d2525fb43a83979a59779d2a9c45b7435eeb04e3b46788b6a771115901ae877a5f16dc1b8c92a5f9b5a784eb8a71b076d7728187a09f86460f3fb3a42 hash: J8mHv9fZK57adtx8MfJZ3mGVXuYqq3nGFBGYYCvWmodG }
+xoring..BankHash 2ccafdbdfe1dcaec309450a8893b6c1c52417650751ecdea6d949bc5ac23889c key: 2p3CLZ3dXB3pCwf2wPWeQCFNUyQUtdY85TbemRzqUAhP slot: 494543 account: Account { lamports: 1 data.len: 2054 owner: Vote111111111111111111111111111111111111111 executable: false rent_epoch: 127 data: 08eaac7d2525fb43a83979a59779d2a9c45b7435eeb04e3b46788b6a771115901ae877a5f16dc1b8c92a5f9b5a784eb8a71b076d7728187a09f86460f3fb3a42 hash: 41rTyhXJ11BUubGyUgAu62QBDcirFfZnmoQfG2BrM3Es }
 xoring..BankHash 63c0ade55745293bfed6b3afb1cb98e0a84dd129f33f1d464be63f5b3d7fa251 key: 2pVA5dT6n4CTzcPi7pPjyD8Lt9gfkgJqzQUSFHFmfKnz slot: 440 account: Account { lamports: 1000000000 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 4 data: 0200000000000000000000000db918aed6a438791f97fbdf264c1a0c493657ad964a0a4977037095a353619d0db918aed6a438791f97fbdf264c1a0c493657ad hash: 7iPmacEDU5fkziBhiq1wFwET6ZyzM7hVg6NdKUJZuUTa }
 xoring..BankHash 82446663b7267573c16c7eb554a990a951f9e09c5d56af7fbd061b83c0d013ab key: 2pqKftDZ7UN3f7cf63UxYYTxASwqSf1fiQ4WCCTc5iKx slot: 0 account: Account { lamports: 18446744073709551615 data.len: 4 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 03000000 hash: 9mWVekKomcTffq4n3WDz7PjADzokqK1sD5b8d9FN96fC }
 xoring..BankHash 98fbfcbb9810245591d05590e566eae5ae4fb8fad90846b610e8d03757feb37f key: 2q1homXzSa2E2VfBXJTnW6tJQMayUPT6CtrXS1RHF3x1 slot: 0 account: Account { lamports: 20833333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: BJBq5FZz6uMCMYUcoCUpLxXtVsh2JPpZeyfxgx33119G }
@@ -767,7 +767,7 @@ xoring..BankHash ad2c600e91148a166eb4a0f3358d7d4fbd9e0294b66bcc6d1bbb79b06b9d7df
 xoring..BankHash ac68830abbd1d0a2517732a1e198aca68bf9a0901784b0f225a021135c97bed2 key: 761bwKRwZzwVpaAhuw3xxsEpLemGevDj3RJQHZVpz42 slot: 0 account: Account { lamports: 8333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: Cc1ZBXcC7sy6qkPGJme5HJNj2nyj58G2cHXpSsJdUsHK }
 xoring..BankHash 9315681dab9309cb75a512d7354b7f40079125eb56ac7c1895ae5e4adeea1bfb key: 76976KNeqYebN5qxemqgvmnyuNKva6kw4hf7o1d3kgD8 slot: 0 account: Account { lamports: 100000000000000 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: Au9s6iNQ7WHVvqwfcnbXYyrG7ttsYexTE33LJGU3Aw8e }
 xoring..BankHash 395465fcd397a42fdd2a0affdf600fee7df1c61badfe9fd97acbdff3992a7835 key: 76NoRXeQABdhxgApRoVhX1znvrP1rwWa9wv1q4FnMHJy slot: 0 account: Account { lamports: 18446744073709551615 data.len: 4 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 03000000 hash: 4rnt797nxiAQP7By7i9PpxfKR35xjs176d6kJWxT9bBi }
-xoring..BankHash bb05c053e5766147a038198ed57da4fa7ada7339069afd570e487cec10483cc8 key: 76U22xYYhXHdQfzQS2VFGsvbkVtjgsLSbDjJ1T2SkakR slot: 494530 account: Account { lamports: 1 data.len: 2054 owner: Vote111111111111111111111111111111111111111 executable: false rent_epoch: 127 data: b3d6115a5f300876b3768d4837bf60568628ec42ffbb95491d7901b0953e2bec5a8c93230f2998e8ab66a88e833723e2bd2811149bc77f1777b70f032522e3e2 hash: Db4KuUL2X34WaTUQ2ocvd8o2KtmsrLkj7GheahzKoCbR }
+xoring..BankHash 08ab80ec312fb27d40924c854916f4fb822dd7f4517784cc5e908ae3af9bf569 key: 76U22xYYhXHdQfzQS2VFGsvbkVtjgsLSbDjJ1T2SkakR slot: 494543 account: Account { lamports: 1 data.len: 2054 owner: Vote111111111111111111111111111111111111111 executable: false rent_epoch: 127 data: b3d6115a5f300876b3768d4837bf60568628ec42ffbb95491d7901b0953e2bec5a8c93230f2998e8ab66a88e833723e2bd2811149bc77f1777b70f032522e3e2 hash: aqwXxtdqWDZaAbcss6voQPpesCk4JsacskAxPjYoYat }
 xoring..BankHash 62db62821c3dcbb88415afa73fb89f2b260f83ef7b41950b243ac466b94eaf01 key: 77uXenX1Y9T2D1pcnHnYsYiwTTHbnzkyrKX5fQFMGVCR slot: 0 account: Account { lamports: 2000000000 data.len: 0 owner: 11111111111111111111111111111111 executable: false rent_epoch: 0 hash: 7etyojF5xc21X5uyzyJGDKxDUU7Sf9QdXtPtk4x6Epic }
 xoring..BankHash ae74e040ee50cfd5cd3914e71e8ee904a5f66275e94c12e2ade8372cfb8ed099 key: 78MDwevr4RGPAaZYVqvh4LSKYCsFtSwhBtYZ1zynTqNw slot: 0 account: Account { lamports: 40404040333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: Ck1JhTN9kMmG4hoax62yunVG4AQ46hfndruVPm7uYUc8 }
 xoring..BankHash cdb971f05565f87e931e2ec96c0b9785f9e487eca70352be79983e847e19c54f key: 79GEGmJcJFQXTYfHKkFfYJtcAVSaHqAqYHoHMQhWvxS4 slot: 0 account: Account { lamports: 8333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: Er4b2LW2DS6j5wjtjNSwGQbWzM3fdPdAabQyjPYgN9ok }
@@ -1636,7 +1636,7 @@ xoring..BankHash 94b2148d0502486b436b30229e98700e12c36481224502a8cba0ee619ade631
 xoring..BankHash 305bc6bffc294e3c2bf9035a501981862334062cefc5b9a2bdf644f39af4d186 key: D5BwKsdXXwjemd5Pcm9QLKiWSYNhCVyJLqKMjHReRLS4 slot: 0 account: Account { lamports: 25000000000000 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: 4FmjkC1BvcC8QmTHny7AH345VBQcy3S76c7sBZY8y9DX }
 xoring..BankHash 30e7cf3e3ce7841918b8eb405d47849065583805ee9ed73b456085abc1203c43 key: D5viiP5qw64rChT2peUcNHTSXeim39PsvtRuuhDo8C83 slot: 0 account: Account { lamports: 41666666666667 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000ddf2e4c81eafae2d68ac99171b066c87bddb168d6b7c07333cd951f36640163d312fa06ccf1b671b26404a34136161ed2aba9e66 hash: 4HuapJjdA4zGb5sE1ptF26f5gaF2DLw7fHwQ36LBnjW2 }
 xoring..BankHash 2d9c40610686c3db62358fe30fe11bbd5c757d7e66a8675059a9a0ea2260fe97 key: D6y6ZE8ax3KzYTcJ1LQrKHj4dfyJYeUhstL8Fr5F6aQy slot: 0 account: Account { lamports: 2264848333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: 453Y67AnkrhSczBde47M8w8MLfCV3E81rvo3Wbm9mC1k }
-xoring..BankHash 9f62c928c08f92a1bc5a6dfb0eca1a3faeae1a107c5fcbb832314be24ace3762 key: D71JRzjPpHipt8NAWnWb3yZoXezbkGXqSf7TVCir6wvT slot: 494530 account: Account { lamports: 999999995 data.len: 0 owner: 11111111111111111111111111111111 executable: false rent_epoch: 127 hash: BjBC48oWK457ryo3YNu46NWiQ8piKk3RfGsSCKAKUPi9 }
+xoring..BankHash a9cb1d2235fc22af5c1338be22850bbe26d901daf35e7ecbe0ef89ba84475b70 key: D71JRzjPpHipt8NAWnWb3yZoXezbkGXqSf7TVCir6wvT slot: 494543 account: Account { lamports: 999999995 data.len: 0 owner: 11111111111111111111111111111111 executable: false rent_epoch: 127 hash: CRoY5RoM3hvubMP3s7tUbH6KUUGSdM51WySosSC7ZKLT }
 xoring..BankHash aed5a018a5957cacee9c6aba04ae2174726e478277a621fd40a5892ad61fcc88 key: D78LvKLtMtBcx3q9mZQ7mzujVbtniGKCE1d7yCbPJddo slot: 0 account: Account { lamports: 333333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 0100000000000000000000006a56514c29f6b1de4d46164621d6bd25b337a711f569f9283c1143c7e8fb546eb420af728f58d9f269d6e07fbbaecf6ed6535e53 hash: CmUsWHrFDb2n9ehVaunvUR7xDckE8nm4sX4MTv2FDEPR }
 xoring..BankHash d105256b01998e70fe4b924d30517c010c8b500b6192bc2ca380113277c48645 key: D7Gfs5jgNRSxkv9mZkV4K24S4vHYVBq39G6H1fyE9nad slot: 0 account: Account { lamports: 100000000000000 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: F4vm64e1n9b8uRjDqV4BA9Lv7Wq31QTzg6asJpbxeESG }
 xoring..BankHash 13dde4061eb3dc87e1f585753647c0362c1b82fff01dd9a8472808c4a9461782 key: D7ZbWLcQRjfPAw5dyjKSBJeaqmDNeJbN5Le71nfy9zcW slot: 0 account: Account { lamports: 5050505000000 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: 2LYzH4fbeiMtJQj38hbUurTxRjbCJ8zwh6L2ns8VaYbj }
@@ -2413,7 +2413,7 @@ xoring..BankHash 1758dbd08b16f0179f40717ef34ed7cfb6edc493bc17f847b22db086d8cd215
 xoring..BankHash dbc83e1d3b1fe631a85817eaaa583620b6b34d17d1176823f5fbdd36f489fcc6 key: av9f85uQRNrWGKmzxTs78YYUfLuhb4X9C65iJLz5NTw slot: 0 account: Account { lamports: 133333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: FnwP84ekNNcBPKehQQgiQbY6DBzAZ1FpkifF35iMX1vu }
 xoring..BankHash f41665669c714f1555741b9f042f30ae8bbf06a15e9fa70678c039102636fc70 key: b56tCU3fs2pSMdgxgrkLeEYLf1ALHMjUQJ4mHXLX4G7 slot: 0 account: Account { lamports: 41666666666666 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: HRpHZLggHM5EUFckC1vFLkZsa3vitRw68pJNNCbaVqWo }
 xoring..BankHash 5ebd807a72d94328dccb50b3f76f492a36df77aeab0f9e7e4eacc19d41de61b5 key: bVuQjn6E7cEx5wFNQvvZXKebwLDT2Chnc4THg6CdvDR slot: 0 account: Account { lamports: 8333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: 7NpvMtZEJAex7cqzDEZ7THBi5GTEC5gTrPhQ1e4VKebW }
-xoring..BankHash 9b607c884ba413a71ff0e2ab374691319e8d7914e6e68973aef4aca52b876f6c key: boot1Z6jb15CLqpaMTn2CxktktwZpRAVAgHZEW6SxQ7 slot: 494530 account: Account { lamports: 1999999999 data.len: 0 owner: 11111111111111111111111111111111 executable: false rent_epoch: 127 hash: BTXXXMAHgNx2iz9CJ6S5z5Vfnvwxc7bbi6Vf3TEC7zrF }
+xoring..BankHash a7ea8ce2842c540e4e76062d92b1574ca29634b0ac5c26a5b69b43fe0bea5ec3 key: boot1Z6jb15CLqpaMTn2CxktktwZpRAVAgHZEW6SxQ7 slot: 494543 account: Account { lamports: 1999999999 data.len: 0 owner: 11111111111111111111111111111111 executable: false rent_epoch: 127 hash: CJUXLU5fkEbzThDc8oh5dYiYzbh3NWD3HJXwmhKLLdi2 }
 xoring..BankHash f651a17c137b55d9656a7a56f746182b57846b899ae52613f69799c3bbb60399 key: brvgEwJXcS2THdqh1WkXc33z3yxMWZC8CWyLiCE4zX4 slot: 0 account: Account { lamports: 133333333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: HaXVKnR8ztCkEGbpdFDzAtzDmeUnNqK9Lea8PemiSp8p }
 xoring..BankHash 0556edb195fe2b57837f6e308f6c34aa18bafc7a6a4b720225393a8e2621342f key: cgpeiHPYwvgPMyjdmPSQPxXeZTkUPZfr9kEnXJvi1kz slot: 0 account: Account { lamports: 20833333333333 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: MqvGnpX2JDdsNBh8jTCySWkkZNL1WpKsjR2UCSKmYzA }
 xoring..BankHash 690c84cf382ce4e303723122fbfd342e88ed2abbb9fdc823b0513b7a3b0793ea key: chJDXFusQCiotwCiVQxXPUY74HMLoycogTKVvKXWd7B slot: 0 account: Account { lamports: 1515151666667 data.len: 2000 owner: Stake11111111111111111111111111111111111111 executable: false rent_epoch: 0 data: 010000000000000000000000cafebabedeadbeef000000000000000000000000000000000000000000000000cafebabedeadbeef000000000000000000000000 hash: 854t38YbFKoZdD1cQmbVSJns9NJXGzCAsLbUGA1ae9F7 }

Ultimately resulting in this:

ryoqun@ubuqun:~/work/solana/solana$ grep xored /tmp/validator-excerpt.log /tmp/verify.log 
/tmp/validator-excerpt.log:[2019-12-04T21:38:49.427822053Z DEBUG solana_runtime::accounts_db] xored..: computed: BankHash fe1ede03ddfca2d426b64ff1478b5d87df0b65bff42940d4e3a2e0d0af4e68b9 expected: BankHash fe1ede03ddfca2d426b64ff1478b5d87df0b65bff42940d4e3a2e0d0af4e68b9
/tmp/verify.log:[2019-12-04T22:44:44.943422292Z DEBUG solana_runtime::accounts_db] xored..: computed: BankHash 95ca959a63f6a834943a4f7006e5b8116a76f3d1ae7698eb0bab5916c2245b86 expected: BankHash fe1ede03ddfca2d426b64ff1478b5d87df0b65bff42940d4e3a2e0d0af4e68b9
ryoqun commented 4 years ago
ryoqun@ubuqun:~/work/solana/solana$ git log -n 1 HEAD
commit f5fad5b43d7287ef146477c7625d361d78cd5bcc (HEAD -> master)
Author: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Date:   Mon Dec 2 18:11:10 2019 -0800

    Correctly parse ip echo server response and fix broken test (#7196) (#7200)

    automerge
ryoqun@ubuqun:~/work/solana/solana$ git diff
diff --git a/ledger/src/snapshot_utils.rs b/ledger/src/snapshot_utils.rs
index fa09d108d..3bb4c350c 100644
--- a/ledger/src/snapshot_utils.rs
+++ b/ledger/src/snapshot_utils.rs
@@ -152,6 +152,8 @@ where

 pub fn add_snapshot<P: AsRef<Path>>(snapshot_path: P, bank: &Bank) -> Result<()> {
     bank.purge_zero_lamport_accounts();
+    assert!(bank.verify_hash_internal_state(), "assert internal failed");
+
     let slot = bank.slot();
     // snapshot_path/slot
     let slot_snapshot_dir = get_bank_snapshot_dir(snapshot_path, slot);
diff --git a/runtime/src/accounts_db.rs b/runtime/src/accounts_db.rs
index f2d1ec921..26ed35f73 100644
--- a/runtime/src/accounts_db.rs
+++ b/runtime/src/accounts_db.rs
@@ -289,7 +289,7 @@ impl AccountStorageEntry {
         if count > 0 {
             *count_and_status = (count - 1, status);
         } else {
-            warn!("count value 0 for slot {}", self.slot_id);
+            panic!("count value 0 for slot {}", self.slot_id);
         }
         count_and_status.0
     }
@@ -935,7 +935,7 @@ impl AccountsDB {
                 if let Some((pubkey, account, slot)) = option {
                     if !sysvar::check_id(&account.owner) {
                         let hash = BankHash::from_hash(&Self::hash_account(slot, &account, pubkey));
-                        debug!("xoring..{} key: {}", hash, pubkey);
+                        debug!("xoring..{} key: {} slot: {} account: {:?}", hash, pubkey, slot, account);
                         collector.push(hash);
                     }
                 }
@@ -946,6 +946,7 @@ impl AccountsDB {
         }
         let slot_hashes = self.slot_hashes.read().unwrap();
         if let Some(state) = slot_hashes.get(&slot) {
+            debug!("xored..: computed: {} expected: {}", hash_state, *state);
             hash_state == *state
         } else {
             false
@@ -1077,6 +1078,8 @@ impl AccountsDB {

     /// Store the account update.
     pub fn store(&self, slot_id: Slot, accounts: &[(&Pubkey, &Account)]) {
+        assert!(!self.accounts_index.read().unwrap().is_root(slot_id), "store to root!!!");
+
         let hashes = self.hash_accounts(slot_id, accounts);

         let mut store_accounts = Measure::start("store::store_accounts");
sakridge commented 4 years ago

Is there another file? for snapshot 494547?

ryoqun commented 4 years ago

@sakridge Sure!

snapshot-494547.tar.bz2.zip

ryoqun commented 4 years ago

According to @sakridge, we almost certainly spotted the root cause with a PR (#7281) for it already being prepared :) :

[1:09 PM] sakridge: @ryoqun this is what I get when I replayed the tds ledger:
[2019-12-05T04:07:09.105194488Z WARN  solana_ledger::blocktree_processor] slot 494543 is dead
[1:10 PM] sakridgel: so those account updates in 494,543 should not be in the snapshot
ryoqun commented 4 years ago

@sakridge Phew, our guess work was correct after all!! I've confirmed the real snapshot verificaton error currently occurring in TDS cluster is really fixed by the PR (#7281).

snapshot created with the fix:

Note: warning messages are expected things.

+ nice -n 20 solana-ledger-tool-release --ledger from-snapshot verify
[2019-12-06T02:44:34.815630512Z INFO  solana_ledger::blocktree] Maximum open file descriptors: 65000
[2019-12-06T02:44:34.886844175Z INFO  solana_ledger::blocktree] "from-snapshot/rocksdb" open took 71ms
Verifying ledger...
[2019-12-06T02:44:34.886904806Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "from-snapshot/snapshot"
[2019-12-06T02:44:34.886989756Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "from-snapshot/snapshot.tar.bz2"
[2019-12-06T02:44:38.230301280Z INFO  solana_ledger::snapshot_utils] snapshot untar took 3.3s
[2019-12-06T02:44:38.230838175Z INFO  solana_ledger::snapshot_utils] Loading from "/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/snapshots/840030/840030"
[2019-12-06T02:44:38.736560644Z WARN  solana_runtime::accounts_db] AccountsDB error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840022.1124679\" to \"from-snapshot/accounts\": entity not found"
[2019-12-06T02:44:38.736600025Z INFO  solana_runtime::accounts_db] Err(Custom { kind: Other, error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840022.1124679\" to \"from-snapshot/accounts\": entity not found" })
[2019-12-06T02:44:45.051333719Z WARN  solana_runtime::accounts_db] AccountsDB error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840020.1124672\" to \"from-snapshot/accounts\": entity not found"
[2019-12-06T02:44:45.051372920Z INFO  solana_runtime::accounts_db] Err(Custom { kind: Other, error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840020.1124672\" to \"from-snapshot/accounts\": entity not found" })
[2019-12-06T02:44:45.051406290Z WARN  solana_runtime::accounts_db] AccountsDB error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840020.1124674\" to \"from-snapshot/accounts\": entity not found"
[2019-12-06T02:44:45.051417690Z INFO  solana_runtime::accounts_db] Err(Custom { kind: Other, error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840020.1124674\" to \"from-snapshot/accounts\": entity not found" })
[2019-12-06T02:44:45.100739778Z WARN  solana_runtime::accounts_db] AccountsDB error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840021.1124676\" to \"from-snapshot/accounts\": entity not found"
[2019-12-06T02:44:45.100779958Z INFO  solana_runtime::accounts_db] Err(Custom { kind: Other, error: "Unable to move \"/home/ubuntu/solana/from-snapshot/snapshot/.tmpHIVfBW/accounts/840021.1124676\" to \"from-snapshot/accounts\": entity not found" })
[2019-12-06T02:44:45.263708602Z INFO  solana_runtime::accounts_db] accounts_db: total_stores: 262 newest_slot: 840030 oldest_slot: 0 max_slot: 0 (num=7) min_slot: 840009 (num=1)
[2019-12-06T02:44:45.263766002Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2019-12-06T02:44:45.263953583Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=262i
[2019-12-06T02:44:45.294325897Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 7.1s
[2019-12-06T02:44:45.301388409Z INFO  solana_ledger::blocktree_processor] processing ledger from root: 840030...
[2019-12-06T02:44:45.324904253Z INFO  solana_ledger::blocktree_processor] ledger processed in 23ms. 1 fork at 840030
Ok

snapshot created without the fix (fetched from the tds cluster):

+ solana-ledger-tool --ledger for-snapshot-from-cluster/sss verify
[2019-12-06T02:44:35.426048165Z INFO  solana_ledger::blocktree] Maximum open file descriptors: 65000
[2019-12-06T02:44:35.612621004Z INFO  solana_ledger::blocktree] "for-snapshot-from-cluster/sss/rocksdb" open took 186ms
Verifying ledger...
[2019-12-06T02:44:35.612956398Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "for-snapshot-from-cluster/sss/snapshot"
[2019-12-06T02:44:35.613272680Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "for-snapshot-from-cluster/sss/snapshot.tar.bz2"
[2019-12-06T02:44:45.304573236Z INFO  solana_ledger::snapshot_utils] snapshot untar took 9.7s
[2019-12-06T02:44:45.305102780Z INFO  solana_ledger::snapshot_utils] Loading from "/home/ubuntu/solana/for-snapshot-from-cluster/sss/snapshot/.tmpQWWRF6/snapshots/840030/840030"
[2019-12-06T02:44:50.464539026Z INFO  solana_runtime::accounts_db] accounts_db: total_stores: 292 newest_slot: 840030 oldest_slot: 0 max_slot: 0 (num=7) min_slot: 481 (num=1)
[2019-12-06T02:44:50.464748257Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2019-12-06T02:44:50.466227020Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=292i
thread 'main' panicked at 'Snapshot bank failed to verify', ledger/src/snapshot_utils.rs:228:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.