lightstreams-network / lightchain

Fast proof-of-authority blockchain based on go-ethereum and tendermint
https://lightstreams.network
GNU General Public License v3.0
55 stars 16 forks source link

Fix BAD_NONCE after running workload test #177

Closed ggarri closed 5 years ago

ggarri commented 5 years ago

How to reproduce it

Actual behaviour

The test is failing with the following output:

  Workload
    ✓ should create an account for testing purposes, not asserting (3487ms)
    1) should return 100 tx receipts whose state is 0x1

  1 passing (15s)
  1 failing

  1) Workload
       should return 100 tx receipts whose state is 0x1:
     Error: Returned error: insufficient funds for gas * price + value

Update

The Error: Returned error: insufficient funds for gas * price + value was solved by fixing a test in commit: https://github.com/lightstreams-network/lightchain/commit/cc5cf4a008a3f007c0a0437ef3da344a497a58ac

Remaining error

ERROR[06-04|16:33:17.135] Nonce not strictly increasing. Expected 8 got 1006 engine=consensus module=ABCI
ERROR[06-04|16:33:17.136] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-04|16:33:17.137] Rejected bad transaction                 engine=consensus module=mempool     tx=DB5BFF06F444DA929DA8BDF21071D2AB8862E0D2BE06278DD221058373DE5E75   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil

Expected behaviour

Generating bulk of TXs should not corrupt a local client's nonce.

ggarri commented 5 years ago

After an increase on the max_packet_msg_payload_size to x3 we noticed an significant increase in the occurrence of this issue.

From the below logs we could see how after tx 791, there is a jump to tx 793, and the tx 792 is captured after tx 796. What maybe be caused by a wrong broadcasted process or a racing condition happening with memory or cpu are over loaded by a large amount of txs

INFO [06-24|11:02:50.563] Checking TX                              engine=consensus module=ABCI        hash=0x64fa1aa7cea08dd23c7821ea51c3d808cb20393c662cedf06ea0a397fcdc76d2 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=790 cost=110500000000000000
INFO [06-24|11:02:50.563] TX validated                             engine=consensus module=ABCI        hash=0x64fa1aa7cea08dd23c7821ea51c3d808cb20393c662cedf06ea0a397fcdc76d2 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e state_nonce=791
INFO [06-24|11:02:50.563] Added good transaction                   engine=consensus module=mempool     tx=5218ECEEEA19F96854BC3A636ED64EB63C6CE3ED4679EF6D166BC2EF38E2698E   res=&{CheckTx:} height=203 total=18
INFO [06-24|11:02:50.564] Submitted transaction                    fullhash=0x29d029b61806e25f707d8bca393f7d9175326adbcb0f2dd0c58ae23a8302b656 recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.564] Checking TX                              engine=consensus module=ABCI        hash=0x29d029b61806e25f707d8bca393f7d9175326adbcb0f2dd0c58ae23a8302b656 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=791 cost=110500000000000000
INFO [06-24|11:02:50.564] TX validated                             engine=consensus module=ABCI        hash=0x29d029b61806e25f707d8bca393f7d9175326adbcb0f2dd0c58ae23a8302b656 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e state_nonce=792
INFO [06-24|11:02:50.564] Added good transaction                   engine=consensus module=mempool     tx=4FA0EC83D56C36D29C2DE57CC46D6D314B118CBF3B8CD3A21CA383CD727D8442   res=&{CheckTx:} height=203 total=19
INFO [06-24|11:02:50.565] Submitted transaction                    fullhash=0x63de9c261b367b968c36f2718c774855fd991fd95fdd8693c61c8568c0604951 recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.565] Submitted transaction                    fullhash=0x35e1e521407eafb2d64b0338377a83cba40edda29dc54126f0d35804eb5f24e3 recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.566] Checking TX                              engine=consensus module=ABCI        hash=0x35e1e521407eafb2d64b0338377a83cba40edda29dc54126f0d35804eb5f24e3 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=793 cost=110500000000000000
ERROR[06-24|11:02:50.566] Nonce not strictly increasing. Expected 792 got 793 engine=consensus module=ABCI
ERROR[06-24|11:02:50.566] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-24|11:02:50.566] Rejected bad transaction                 engine=consensus module=mempool     tx=8999E3C1C0EADDD1D412C9253DDC80573B1EBFED4C37B3C89DC90B935F99AC7C   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
INFO [06-24|11:02:50.566] Submitted transaction                    fullhash=0x51d61254cc2da52e3ea782aeb6ed78c420910de3193d43ab3d15d7c2967a252e recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.567] Submitted transaction                    fullhash=0xf096eda6a85b2864b18c0b89c0885445cb4711629ad2647c11ae7361872368ec recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.568] Checking TX                              engine=consensus module=ABCI        hash=0x51d61254cc2da52e3ea782aeb6ed78c420910de3193d43ab3d15d7c2967a252e from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=794 cost=110500000000000000
ERROR[06-24|11:02:50.568] Nonce not strictly increasing. Expected 792 got 794 engine=consensus module=ABCI
ERROR[06-24|11:02:50.568] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-24|11:02:50.568] Rejected bad transaction                 engine=consensus module=mempool     tx=A8EBB164DE8DC8C48F3C48969519BD1E9197CE02DE3C63124576636DB3FC4751   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
INFO [06-24|11:02:50.569] Submitted transaction                    fullhash=0x019157d77dc600b2756f8222e2f5416b330480c183182b24d6817ae8f45fac8b recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.569] Checking TX                              engine=consensus module=ABCI        hash=0xf096eda6a85b2864b18c0b89c0885445cb4711629ad2647c11ae7361872368ec from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=795 cost=110500000000000000
ERROR[06-24|11:02:50.569] Nonce not strictly increasing. Expected 792 got 795 engine=consensus module=ABCI
ERROR[06-24|11:02:50.569] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-24|11:02:50.569] Rejected bad transaction                 engine=consensus module=mempool     tx=9F8DA53A03E467D0777F15BBB7869FCEF3C97F265B17B79B421E5494338F005F   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
INFO [06-24|11:02:50.570] Checking TX                              engine=consensus module=ABCI        hash=0x019157d77dc600b2756f8222e2f5416b330480c183182b24d6817ae8f45fac8b from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=796 cost=110500000000000000
ERROR[06-24|11:02:50.570] Nonce not strictly increasing. Expected 792 got 796 engine=consensus module=ABCI
ERROR[06-24|11:02:50.570] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-24|11:02:50.570] Rejected bad transaction                 engine=consensus module=mempool     tx=CAA024349AB261BF338DB4AA1B541C982C6F68F8B9E61E2BFEC4E30975BFC2B7   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
INFO [06-24|11:02:50.570] Submitted transaction                    fullhash=0x0d2b2ecd06bd637a5ba71b2d017299719621ab743a181913d3d7f58264399ecb recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.570] Checking TX                              engine=consensus module=ABCI        hash=0x63de9c261b367b968c36f2718c774855fd991fd95fdd8693c61c8568c0604951 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=792 cost=110500000000000000
INFO [06-24|11:02:50.570] TX validated                             engine=consensus module=ABCI        hash=0x63de9c261b367b968c36f2718c774855fd991fd95fdd8693c61c8568c0604951 from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e state_nonce=793
INFO [06-24|11:02:50.571] Added good transaction                   engine=consensus module=mempool     tx=8D6968C39E54BBA16662439E1EF39F09017744104846AD20B98D39F0562B0AF9   res=&{CheckTx:}                            height=203 total=20
INFO [06-24|11:02:50.571] Submitted transaction                    fullhash=0x0500f82b8de3c97dea441b56e05c79683a593a06cbd7f10179f1b124ae736cda recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
INFO [06-24|11:02:50.571] Checking TX                              engine=consensus module=ABCI        hash=0x0d2b2ecd06bd637a5ba71b2d017299719621ab743a181913d3d7f58264399ecb from=0xc916Cfe5c83dD4FC3c3B0Bf2ec2d4e401782875e nonce=797 cost=110500000000000000
ERROR[06-24|11:02:50.571] Nonce not strictly increasing. Expected 793 got 797 engine=consensus module=ABCI
ERROR[06-24|11:02:50.571] BAD_NONCE                                engine=consensus module=ABCI
INFO [06-24|11:02:50.571] Rejected bad transaction                 engine=consensus module=mempool     tx=A6CF2E3E6285E7AF6FDB5E2EFFD6B7D15DF419BA89A94901F2C68BFFF3719A7F   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
INFO [06-24|11:02:50.572] Submitted transaction                    fullhash=0xb0755f29b79bcbf91aef247f0712afd6f2fa6a38d862fec394e08d27903c8d24 recipient=0x3fA9f723D02cB15Ce41C1488120F985A9D6E70a2
EnchanterIO commented 5 years ago

Branch to work on this issue: 177_bad_nonce

EnchanterIO commented 5 years ago

First problem was just a wrong test, solved in https://github.com/lightstreams-network/lightchain/commit/cc5cf4a008a3f007c0a0437ef3da344a497a58ac.

I can't locally replicate the bad nonce issue even with 1000 TXs but I continue in trying to do so:

Screen Shot 2019-06-27 at 15 23 45
ggarri commented 5 years ago

I made the following provisional changes in events.go to check if txs are or not pulled in the right order from the subscripted channel

for obj := range db.ethTxsCh {
        db.logger.Debug("Captured NewTxsEvent from pool")
        for _, tx := range obj.Txs {
            db.logger.Debug("Broadcasting tx...", "nonce", tx.Nonce())
            if err := db.consAPI.BroadcastTx(*tx); err != nil {
                db.metrics.BroadcastedErrTxsTotal.Add(1, err.Error())
                db.logger.Error("Error broadcasting tx", "err", err)
            } else {
                db.metrics.BroadcastedTxsTotal.Add(1)
                db.logger.Debug("Broadcasted tx", "nonce", tx.Nonce())
            }
        }
    }

The output generated is the one below. There we can see how tx 53 is being fetched before tx 52, therefore we could conclude there is not racing issue on Lightchain side which made txs to be broadcasted in wrong order.

DEBUG [07-01|10:50:05.131] Captured NewTxsEvent from pool           engine=database
DEBUG [07-01|10:50:05.131] Broadcasting tx...                       engine=database  nonce=53
INFO  [07-01|10:50:05.132] Checking TX                              engine=consensus module=ABCI        hash=0x058ae9d4566696f271b47a30b480a8494452426c7a7bf1e5e229737dae8bae5c                                                             nonce=53 cost=110500000000000000
ERROR [07-01|10:50:05.132] Nonce not strictly increasing. Expected 52 got 53 engine=consensus module=ABCI
ERROR [07-01|10:50:05.132] BAD_NONCE                                engine=consensus module=ABCI
INFO  [07-01|10:50:05.132] Rejected bad transaction                 engine=consensus module=mempool     tx=13A8C39D8BAB21529F8E3613046C142348C8D45E03BDFDAE7AA2C1C010341C2B   res="&{CheckTx:code:1 log:\"BAD_NONCE\" }" err=nil
DEBUG [07-01|10:50:05.133] Broadcasted tx                           engine=database  nonce=53
DEBUG [07-01|10:50:05.133] Captured NewTxsEvent from pool           engine=database
DEBUG [07-01|10:50:05.133] Broadcasting tx...                       engine=database  nonce=52
INFO  [07-01|10:50:05.133] Checking TX                              engine=consensus module=ABCI        hash=0x2a386d334507260b9385d53293746bf7abd92fe2477b6a24fa46be98384c351e                                                             nonce=52 cost=110500000000000000
INFO  [07-01|10:50:05.133] TX validated                             engine=consensus module=ABCI        hash=0x2a386d334507260b9385d53293746bf7abd92fe2477b6a24fa46be98384c351e                                                             state_nonce=53
INFO  [07-01|10:50:05.133] Added good transaction                   engine=consensus module=mempool     tx=519415287FC095D727C85536095ACB1B9F71A50077F639FD7B137035D757D7BE   res=&{CheckTx:}                            height=11 total=47
DEBUG [07-01|10:50:05.134] Broadcasted tx                           engine=database  nonce=52