kadena-io / chainweb-node

Chainweb: A Proof-of-Work Parallel-Chain Architecture for Massive Throughput
https://docs.kadena.io/basics/whitepapers/overview
BSD 3-Clause "New" or "Revised" License
249 stars 95 forks source link

Tx fails in newBlock BuyGas but isn't removed from mempool #829

Closed larskuhtz closed 4 years ago

larskuhtz commented 4 years ago

It seems that certain tx can cause newBlock to fail but are not removed from the mempool. Instead it seems that they get gossiped around and included in each new block, thus block creation of any new blocks on all nodes.

Some background:

Lars 2:21 PM I see errors like this one: Received exception running pact service (execNewBlock): {"tag":"PactInternalError","contents":"tx failure for requestKey when buying gas: (enforce-guard (at 'guard (rea...: Failure: Tx Failed: Keyset failure (keys-all): [caec74f8...]"} Expected? Chainweb.Miner.Miners.localPOW failed: {"tag":"PactInternalError","contents":"{\"tag\":\"PactInternalError\",\"contents\":\"tx failure for requestKey when buying gas: (enforce-guard (at 'guard (rea...: Failure: Tx Failed: Keyset failure (keys-all): [caec74f8...]\"}"}. Restarting ... critical transaction failure: "1La7kZdo-U0e2XYFnthIG62msapZ7YlKpV6aA_y73HY": tx failure for requestKey when buying gas: (enforce-guard (at 'guard (rea...: Failure: Tx Failed: Keyset failure (keys-all): [caec74f8...]

Lars 2:25 PM I see them on all nodes on an ongoing basis.

Hee Kyun 2:37 PM nvm my earlier comment. I rotated the keyset to another key and back, which I confirmed was caec74f8... (edited) The tx should have been rejected right away as a validation failure since test-sender was not > signed properly, but it wasn’t rejected right away and gave me back a requestKey

Lars 2:43 PM So, that’s a bug? (edited)

Hee Kyun 2:45 PM Yeah, but other txs were rejected when sender wasn’t valid (edited)

Lars 2:33 PM I get 121,098 hits for the following log message in the last 10min critical transaction failure: "1La7kZdo-U0e2XYFnthIG62msapZ7YlKpV6aA_y73HY": tx failure > for requestKey when buying gas: (enforce-guard (at 'guard (rea...: Failure: Tx Failed: Keyset failure (keys-all): [caec74f8...]

Linda 2:49 PM This might be a regression to back when buyGas was failing and causing the nodes to spin. >buyGas’s validation should have caught it

larskuhtz commented 4 years ago

The failure comes from this code

    applyBuyGas =
      catchesPactError (buyGas cmd miner) >>= \case
        Left e -> fatal $ "tx failure for requestKey when buying gas: " <> sshow e
        Right _ -> checkTooBigTx initialGas gasLimit applyPayload redeemAllGas

and fatal is implemented as follows:

-- | Denotes fatal failure points in the tx exec process
--
fatal :: Text -> TransactionM db a
fatal e = do
    l <- view txLogger
    rk <- view txRequestKey

    liftIO
      $! logLog l "ERROR"
      $! "critical transaction failure: "
      <> sshow rk <> ": " <> T.unpack e

    internalError e

It throws PactInternalError.

The fact that the transaction id rk is only available in the textual error message indicates that there is no handler that would extract the transaction id and remove it from the mempool.

larskuhtz commented 4 years ago

I wonder if all those exception throw by fatal should be intercepted by an onException that remove the respective tx from the mempool? Or alternatively, all places where this is thrown should call some function that the removes the tx.

When a handler is used the tx id would have to be included as a constructor field (to avoid parsing the textual message).

gregorycollins commented 4 years ago

Yes, I'm adding a constructor to PactException so that we can badlist the offending tx when we call fatal. Most of the rest of the work is plumbing