paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

1014: Priority is too low #2312

Closed maxsam4 closed 5 years ago

maxsam4 commented 5 years ago

I was load testing substrate and after 4.5k+ tx from a single account, I am getting an error when the transfer amount is lower than 10000000000. It works fine for 10000000000 but not for 1000000000. The limit must be somewhere in between.

The exact error shown in the UI is 1014: Priority is too low: (142 vs 142) (The transa) and there is nothing in the node logs.

The account has sufficient balance. I am able to transfer lower amounts from other accounts. Only this particular account is facing the problem.

maxsam4 commented 5 years ago

fwiw, I am getting the same error when using polkadot-js for submitting tx

gguoss commented 5 years ago

Priority is too low: (142 vs 142): Transactions with the same nonce are waiting to be packaged in the tx-pool.

maxsam4 commented 5 years ago

This is not a nonce Issue as I am unable to do even a single tx with the lower amount. I have tried with current nonce and nonce + 1..1000.

There is no such problem when the transfer amount is greater.

tomusdrw commented 5 years ago

@maxsam4 doesn't your account go below existential limit? In such case it's deleted and the nonce is reset.

maxsam4 commented 5 years ago

@tomusdrw 1) What's existential limit? 2) I don't think the nonce got reset because I am able to do the transaction with high nonce given that the transfer amount is high.

tomusdrw commented 5 years ago
  1. Minimal amount that the account needs to have to exist in the chain. https://github.com/paritytech/substrate/blob/b51f8f2a9d420ad3b507d57f4f4a56e33b2d8ead/srml/balances/src/lib.rs#L45
  2. Possiblity you are running into the transaction pool limit? Could you post logs from runing with -ltxpool=trace or try to increase the limits with CLI flags?
maxsam4 commented 5 years ago

I just restarted the node with -ltxpool=trace and now it accepted the transaction with low transfer amount. I'll try to do a few thousand tx again to see If I face this Issue again.

maxsam4 commented 5 years ago

I am facing the error again after a few transactions. The log is available at https://gist.github.com/maxsam4/1779a0070bce33ae36eabd467a93319f

The log contains two transactions. The first one is with higher transfer amount and it went through without errors. The second one had a low transfer amount and it failed. The nonce was incremented properly so that's not the problem.

cc @tomusdrw

maxsam4 commented 5 years ago

Seems like priority = bytes and as bigger amount takes more bytes, it is going through but the lower amount is failing as the lower amount tx takes fewer bytes.

tomusdrw commented 5 years ago

@maxsam4 the logs are not enough to figure out what happens. Yes, the priority is equal to number of bytes in the current runtime, however the main issue is that the pool should not attempt to replace any transactions but rather add it, so priority shouldn't matter at all.

The transaction will attempt to replace some other transaction only if it provides the same tag or the pool is full. From what I see in the logs the pool is not full - you only have ~500 transactions in the pool, so it means that possibly another transaction with the same nonce was imported earlier - I'd need full logs to verify that.

maxsam4 commented 5 years ago

It seems to me that not all invalid transactions are being flushed out by the pool. I will try to create an MCVE for this.

maxsam4 commented 5 years ago

I tried for a while but couldn't pinpoint what exactly was causing the Issue. It seems to randomly appear after doing thousands of transactions (many of which are invalid). Here are some full logs. https://drive.google.com/file/d/1oGP8_vvcWXG6B-1rIM54ei5Q2y8GX4rg/view?usp=sharing Not sure if these will be of any use but here they are anyway.

More info Even when the node is in the "brocken" state, the substrate UI shows 0 pending tx.

Below is the script that I am using to cause the error. The node won't break in a single or run even a couple run. Usually takes 10-12 runs with some tiny modifications in the script to break the node. Therefore this is not a reliable way to generate the error.

// Required imports
// @ts-check
const { ApiPromise, WsProvider } = require('@polkadot/api');
const { Keyring } = require('@polkadot/keyring');
const BN = require('bn.js');

const BOB = '5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty';

async function main () {
  const provider = new WsProvider('ws://127.0.0.1:9944');
  const api = await ApiPromise.create(provider);
  const keyring = new Keyring();
  const test5 = keyring.addFromMnemonic('enhance embody priority wise tree pig trash reform drum sure zebra canoe', {"name": "test5"}, "sr25519");

  let rawNonce1 = await api.query.system.accountNonce(keyring.getPairs()[0].address());
  let nonce1 = new BN(rawNonce1.toString());
  for (let i = 0; i < 100; i++) {
    const transfer = api.tx.balances.transfer(BOB, 1000);
    let nonce = nonce1;
    transfer.signAndSend(test5, { nonce });
    nonce1 = nonce1.add(new BN(1));
  }
  let noncebk = nonce1;
  nonce1 = nonce1.add(new BN(1));
  for (let i = 0; i < 1000; i++) {
    const transfer = api.tx.balances.transfer(BOB, 1000);
    let nonce = nonce1;
    transfer.signAndSend(test5, { nonce });
    nonce1 = nonce1.add(new BN(1));
  }
  let nonce = noncebk;
  const transfer = api.tx.balances.transfer(BOB, 1000);
  console.log(await transfer.signAndSend(test5, { nonce }));
  for (let i = 0; i < 100; i++) {
    const transfer = api.tx.balances.transfer(BOB, 1000);
    let nonce = nonce1;
    transfer.signAndSend(test5, { nonce });
    nonce1 = nonce1.add(new BN(1));
  }

  process.exit();
}

main().catch(console.error);
tomusdrw commented 5 years ago

Thanks for the logs! I've checked the 6.9stderr.txt file and it seem that you are sending transaction with the same nonce value mulitple times:

2019-04-18 11:04:13.939 jsonrpc-eventloop-0 TRACE txpool  [0xbb54519996774dd772d0f5cf5545198be3126e923d13248988725fc9d07a73e0] WaitingTransaction { imported_at: Instant { tv_sec: 6340, tv_nsec: 93441273 }, transaction: Transaction { hash: 0xbb54519996774dd772d0f5cf5545198be3126e923d13248988725fc9d07a73e0, priority: 140, valid_till: 18446744073709551615, bytes: 140, requires: [...], provides: [a05de342fa2a9aed2f2899c97cdb25ba1ec6d1bedfb39b87afcefa981bb4956c4633000000000000], data: 81ffa05de342fa2a9aed2f2899c97cdb25ba1ec6d1bedfb39b87afcefa981bb4956c80238315976f6f1d7991793ee2cb96af6456881bbdedebecb025c73004ddb517b06f511d41b6ca7fbe18c01977b90e6059e9e67948a54fa0238fa0689aefa40a19cd000300ff8eaf04151687736326c9fea17e25fc5287613693c912909cb226aa4794f26a48a10f}, missing_tags: { }}
...
2019-04-18 11:04:32.228 jsonrpc-eventloop-0 TRACE txpool  [0x06b735c00cc2169efad9f3e6c8597ed1247049dd1859f09d20c559aa448675c1] WaitingTransaction { imported_at: Instant { tv_sec: 6358, tv_nsec: 382338136 }, transaction: Transaction { hash: 0x06b735c00cc2169efad9f3e6c8597ed1247049dd1859f09d20c559aa448675c1, priority: 140, valid_till: 18446744073709551615, bytes: 140, requires: [...], provides: [a05de342fa2a9aed2f2899c97cdb25ba1ec6d1bedfb39b87afcefa981bb4956c4633000000000000], data: 81ffa05de342fa2a9aed2f2899c97cdb25ba1ec6d1bedfb39b87afcefa981bb4956cec230d09b1853e073d436de18f77db9627a1efce1d834aa8214191ed51443b7d0abdfaee6aea05be91817472f21bb12e5ebb6866c97edadfa036294b34364b0a19cd000300ff8eaf04151687736326c9fea17e25fc5287613693c912909cb226aa4794f26a48a10f}, missing_tags: { }}

It provides exactly the same value (which if you dig into the code is just (sender, nonce).encode()). And that's completely expected behavior that if the are two transactions with the same nonce and the same priority the second one submitted to the pool will be rejected, cause it would need higher priority to replace transaction that is already in the pool.

The node won't break in a single or run even a couple run. Usually takes 10-12 runs with some tiny modifications in the script to break the node. Therefore this is not a reliable way to generate the error.

If you are running the script multiple times, you will definitely run into this issue. Note the script takes the state nonce initially (rawNonce1) - it doesn't take into considerations transactions that await in the pool already (see #2263). You need to persist the nonce value between runs or wait before all transactions from the pool are processed before submitting more.

maxsam4 commented 5 years ago

Thanks for looking into this @tomusdrw

Note the script takes the state nonce initially (rawNonce1) - it doesn't take into considerations transactions that await in the pool already (see #2263).

That is intentional. See below.

You need to persist the nonce value between runs or wait before all transactions from the pool are processed before submitting more.

That's the problem, some of these transactions get "stuck". They are never mined or removed from the pool and I am unable to send any more transactions. The UI says 0 queued transactions and empty blocks keep getting mined.

Me spamming the pool with various valid and invalid (same none) tx is just an attempt to get a transaction "stuck". I wasn't able to reliably reproduce this but spamming the node in this way always ends up with a stuck a transaction in the pool.

tomusdrw commented 5 years ago

@maxsam4 Well if for some reason you submit transactions with a nonce gap (meaning that the transaction will end up in the future queue) then yeah, they are going to be stuck forever, cause we first need to include transaction with nonce - 1.

I see that its the case for (at least) some of the log files you submitted:

64stderr.txt:2019-04-18 10:16:32.809 jsonrpc-eventloop-2 DEBUG txpool  Pool Status: Status { ready: 0, ready_bytes: 0, future: 51, future_bytes: 7140 }

notice future: 51 transactions, while none in ready. I'll analyze a bit further

maxsam4 commented 5 years ago

@maxsam4 Well if for some reason you submit transactions with a nonce gap (meaning that the transaction will end up in the future queue) then yeah, they are going to be stuck forever, cause we first need to include transaction with nonce - 1.

Yeah, I am doing that intentionally. However, that should not be affecting the normal new transaction with the proper nonce. A newly send tx should have proper nonce. My first thought was that the node was not mining the future tx once the nonce gap is filled and to test that, I put

let noncebk = nonce1;
nonce1 = nonce1.add(new BN(1));

in the script but the node didn't get stuck. (Not in every run, at least). Maybe the node get's stuck if the nonce gap is filled in a future block rather than current. Or maybe if more than 257 tx with future nonce are stuck when the nonce gap is filled. Can't remember if I tested that.

tomusdrw commented 5 years ago

Do you mind reproducing with -lsubstrate_basic_authorship=trace,txpool=trace? That will produce even more logs, but should help us to figure out what happens during block creation.

maxsam4 commented 5 years ago

Do you mind reproducing with -lsubstrate_basic_authorship=trace,txpool=trace? That will produce even more logs, but should help us to figure out what happens during block creation.

Ok, I'll try.

maxsam4 commented 5 years ago

@tomusdrw Here are two logs with stuck tx in the pool https://drive.google.com/file/d/1vSp8AQOdcjVh9pbxqxQQXWxMS4wGeQ6h/view?usp=sharing

mattrutherford commented 5 years ago

I will try to recreate this condition - just confirming you're on a recent build of substrate @maxsam4 ? Because there was a similar issue that was resolved about a month ago: https://github.com/paritytech/substrate/pull/2136

maxsam4 commented 5 years ago

@mattrutherford Thanks for pointing that out! I was using https://github.com/paritytech/substrate/commit/59d5ec144704cec017a01f3e4dbb6b73c5542bf7 so I think that I had that fix already.

folsen commented 5 years ago

@mattrutherford did you come to any conclusion here?

mattrutherford commented 5 years ago

@folsen Sorry for not updating, I have been so far unable to reproduce. @maxsam4 can you confirm if this is still a problem with recent substrate? if so can you confirm what's in bn.js? tia

maxsam4 commented 5 years ago

@folsen Sorry for not updating, I have been so far unable to reproduce. @maxsam4 can you confirm if this is still a problem with recent substrate? if so can you confirm what's in bn.js? tia

I have not tried it since. I am waiting for BABE to be finalized before I stress test substrate again.

ww8912188 commented 4 years ago

@maxsam4 I encountered the same issue just now and figured out that the root cause is I send my tx too fast. I add a sleep and now the issue is gone.