Error messages refactoring

MaksymZavershynskyi commented 4 years ago

Fixes: #978 and #1813

Overview

As discussed with @vgrichina @evgenykuzyakov @frol @fckt @janedegtiareva , we want the node RPC to return structured error messages, e.g. like these:

{"InvalidTx": {"NotEnoughBalance": {"signer_id": "alice_near", "signer_balance": 1000000000000000000000000, "total_cost": 1000000000000924119500000}} }

And we want the front-end and the Applayer to implement the conversion of the structured error messages to human-readable error messages, like these:

Sender alice_near does not have enough balance 1000000000000000000000000 for operation costing 1000000000000924119500000

More specifically we want nearlib to implement this conversion, which is predicated on having a detailed and exhaustive error specification.

Properties

There are two major differences with how error messages are currently returned by RPC:

We are decoupling a human-readable error message from the node code. This is not expected to have a downside since the software developer that will be interacting with the node directly and not through nearshell/nearlib is expected to be qualified enough to understand the JSON format. The upsides are:
- We can iterate on error message description independently from the node releases;
- Localization;
- We are motivating dApp developers to have dApp-specific error descriptions instead of relying on the system errors, promoting better UX. E.g. "The character cannot be created because there is not enough gold" is better than "Sender alice_near does not have enough balance ... ".
We are using JSON (it seems like people are either indifferent on whether we use JSON vs our own format or are leaning towards JSON) for structured errors returned by the node. The upsides:
- We can get rid this code by decorating error types with serde: https://github.com/nearprotocol/nearcore/blob/d869465f0400678b5df9e29d3bd926e8887d2337/core/primitives/src/views.rs#L677
- We can now have error parameters, see signer_id, signer_balance, total_cost in the above example.

Implementation steps

[x] Get rid of the places where we are prematurely converting error to string, instead of using error hierarchy, e.g. here: https://github.com/nearprotocol/nearcore/blob/f93e13c0be52b8069a98de4792b861f07787f003/core/primitives/src/errors.rs#L67
[x] Also make sure we are not using boxed errors;
[x] Add a detailed rustdoc for every error message exposed through RPC. Generate the rustdoc and refer to it as error message spec (maybe even extract all errors into near-vm-errors);
[x] Replace error_type: String in ExecutionErrorView with error_type: ExecutionError and use simultaneously switch to using JSON in nearlib in places that rely on old format of error_type, e.g.: https://github.com/nearprotocol/nearlib/pull/87/files?file-filters%5B%5D=.js&file-filters%5B%5D=.json&file-filters%5B%5D=.ts#diff-52776b8b9355ae3e46add4ae6ac70c2eR47
[x] Implement the above error spec in the nearlib to produce human-readable error messages from structured error messages;
[x] Replace ExecutionErrorView with ExecutionError in nearcore, and simultaneously remove usage of .type field in nearlib, e.g.: https://github.com/nearprotocol/nearlib/pull/87/files?file-filters%5B%5D=.js&file-filters%5B%5D=.json&file-filters%5B%5D=.ts#diff-52776b8b9355ae3e46add4ae6ac70c2eR47

frol commented 4 years ago

I feel somewhat uneasy about the proposed error message structure:

{
  "InvalidTx": {
    "NotEnoughBalance": {
      "signer_id": "alice_near",
      "signer_balance": 1000000000000000000000000,
      "total_cost": 1000000000000924119500000
    }
  }
}

I have experience working with this format in Actions (stored in Transactions). To get a type of the error I have to query the object keys and use the first key:

const errorType = Object.keys(error)[0]
const errorPayload = error[errorType]

The first line makes a hard to enforce assumption that there is exactly one key in the object.

TypeScript definitions are even more troublesome (extracted a minimized example from Explorer):

export interface CreateAccount {}

export interface FunctionCall {
  args: string;
  deposit: string;
  gas: number;
  method_name: string;
}

export interface Transfer {
  deposit: string;
}

export interface DeleteKey {
  public_key: string;
}

export interface Action {
  CreateAccount: CreateAccount;
  FunctionCall: FunctionCall;
  Transfer: Transfer;
  DeleteKey: DeleteKey;
}

export interface Actions {
  actions: Action[];
}

Action interface does not enforce the exactly-one possible key, it just "marks" (it is optional by default in TypeScript) all of them optional.

The following structure is much easier to handle:

{
  "kind": "InvalidTx",
  "details": {
    "kind": "NotEnoughBalance",
    "details": {
      "signer_id": "alice_near",
      "signer_balance": 1000000000000000000000000,
      "total_cost": 1000000000000924119500000
    }
  }
}

However, there is another type system problem here, that is the details type is not tight together with the kind.

lexfrl commented 4 years ago

I like the format @frol proposed. It's better for TypeScript. One thing which is a bit confusing is that we have a details field which contains a different types of object (outer "details" field is an Error type with details, inner "details" field is arguments ). What do you guys think about this?

{
  "type": "InvalidTx",
  "cause": {
    "type": "NotEnoughBalance",
    "args": {
      "signer_id": "alice_near",
      "signer_balance": 1000000000000000000000000,
      "total_cost": 1000000000000924119500000
    }
  }
}

frol commented 4 years ago

Another thought: keep the initially proposed structure (it is compact and the kind/type is tightly linked with the inner value [details / args / cuase]) and implement JS-friendly transformation on the nearlib side.

Ideally, we should be able to auto-generate the public API spec including the OK responses as well as error responses, and then auto-generate the low-level API libraries (json-rpc-provider.ts from nearlib could be auto-generated).

lexfrl commented 4 years ago

Ideally, we should be able to auto-generate the public API spec including the OK responses as well as error responses, and then auto-generate the low-level API libraries (json-rpc-provider.ts from nearlib could be auto-generated).

yeah, that's would be nice to have

MaksymZavershynskyi commented 4 years ago

have experience working with this format in Actions (stored in Transactions). T

Thanks for giving useful insight on what is usable with JS/TS.

@frol

The following structure is much easier to handle:

@fckt

What do you guys think about this?

Both proposed formats are pretty straightforward with serde: https://serde.rs/enum-representations.html

Another thought: keep the initially proposed structure (it is compact and the kind/type is tightly linked with the inner value [details / args / cuase]) and implement JS-friendly transformation on the nearlib side.

JSON is easier to implement and maintain than a custom format. Also, it works only when the hierarchy of the errors messages is composed purely of Rust enums, except for the bottom-most errors. For example, suppose the following error occurs: ExecutionError::Action::ActionError::FunctionCallError::FunctionCallError::HostError::HostError::GasExceeded, which happens when during an execution of smart contract we run out of gas. Every element in this error message is an enum of an enum variant, while GasExceeded can contain some parameters like the amount of prepaid gas. Now imagine, we wanted to also know which method was called, in terms of Rust structures I think it is logical to have this parameter in FunctionCallError instead of HostError or GasExceeded types, which can be done by extending the variant ActionError::FunctionCallError like this:

ActionError {
...,
FunctionCallError{function_name: String, error: FunctionCallError}
}

or another solution. JSON will automatically work with this new format, while with the initially proposed structure it is not clear how to represent error parameters, do we simply pass an array of parameters like: {"function_name": "myfunc", "prepaid_gas": <prepaid gas>} alongside ExecutionError::Action::ActionError::FunctionCallError::FunctionCallError::HostError::HostError::GasExceeded ?

Also, JSON serialization we are getting for free with almost no additional Rust code, while the initially proposed format requires writing conversion, which most likely will be all in one mega-method of ExecutionError. Also, it is not clear what do we do with enum variants vs type names, see Action and ActionError above. Do we include both?

DanielRX commented 4 years ago

I feel somewhat uneasy about the proposed error message structure:

export interface CreateAccount {}

export interface DeleteKey {
  public_key: string;
}

export interface Action {
  CreateAccount: CreateAccount;
  FunctionCall: FunctionCall;
  Transfer: Transfer;
  DeleteKey: DeleteKey;
}

export interface Actions {
  actions: Action[];
}

Isn't the way to type this:

Action = {DeleteKey: DeleteKey} | {CreateAccount: CreateAccount} | ... There is a way to make the union strict, so this would then only match one of those objects for each action? @frol

frol commented 4 years ago

@DanielRX Nice catch! For some reason, the idea has never occurred to me.

DanielRX commented 4 years ago

@frol This also works for exhaustive checking of unions:

union

The first safely checks that the key y exists, and then safely uses x.z The second causes a type error due to z not being a key that exists on all the possible elements of the union.

Which means you can do things like handleAction that will make sure the action is handled correctly

DanielRX commented 4 years ago

I should also mention, your solution with kind is also typable in the same way: kind-union (the error tells you typeof x.value === 'number')

So you don't need to require either format for safety in TS

lexfrl commented 4 years ago

Now imagine, we wanted to also know which method was called, in terms of Rust structures I think it is logical to have this parameter in FunctionCallError instead of HostError or GasExceeded types, which can be done by extending the variant ActionError::FunctionCallError like this:

Yes, that's an interesting point.

I've spent some time on refactoring this is a structure I've got so far. The idea is to hold error context (sent TX, for example) in the common wrapper and to allow use it for 1) matching with the sent Transaction (on the client side) 2) to not have a separate places for the same data. The inner errors could add more error arguments as well.

I really not sure if this structure will work well. I'll really appreciate comments

/// External
#[derive(Debug, Clone, PartialEq, Eq, BorshSerialize, BorshDeserialize, Deserialize, Serialize)]
#[serde(tag = "type")]
pub struct InvalidTxError {
    pub signer_id: AccountId,
    pub public_key: PublicKey,
    pub nonce: Nonce,
    pub receiver_id: AccountId,
    pub block_hash: CryptoHash,
    pub kind: InvalidTxErrorKind,
}

#[derive(Debug, Clone, PartialEq, Eq, BorshSerialize, BorshDeserialize, Deserialize, Serialize)]
pub enum InvalidTxErrorKind {
    InvalidAccessKey(InvalidAccessKeyErrorKind),
    Action(ActionError),
    InvalidSigner,
    SignerDoesNotExist,
    InvalidNonce {
        access_key_nonce: Nonce,
    },
    InvalidReceiver,
    NotEnoughBalance {
        balance: Balance,
        cost: Balance,
    },
    RentUnpaid {
        amount: Balance,
    },
    CostOverflow,
    InvalidChain,
    InvalidSignature,
    Expired,
    AccessKeyNotFound, // moved from InvalidAccessKeyErrorKind 
    /// For this action the sender is required to be equal to the receiver
    OwnershipError,
}

#[derive(Debug, Clone, PartialEq, Eq, BorshSerialize, BorshDeserialize, Deserialize, Serialize)]
#[serde(tag = "type")]
pub enum InvalidAccessKeyErrorKind {
    ReceiverMismatch {
        tx_receiver_id: AccountId,
        ak_receiver_id: AccountId,
    },
    MethodNameMismatch {
        method_name: String,
    },
    ActionError,
    NotEnoughAllowance {
        signer_id: AccountId,
        public_key: PublicKey,
        allowed: Balance,
        cost: Balance,
    },
}

#[derive(Debug, Clone, PartialEq, Eq, BorshSerialize, BorshDeserialize, Deserialize, Serialize)]
pub struct ActionError {
    // position of a failed action in the transaction/receipt
    index: u64,
    kind: ActionErrorKind,
}

#[derive(Debug, Clone, PartialEq, Eq, BorshSerialize, BorshDeserialize, Deserialize, Serialize)]
pub enum ActionErrorKind {
    AccountAlreadyExists,
    AccountDoesNotExist { action: String },
    CreateAccountNotAllowed,
    ActorNoPermission { actor_id, action: String },
    DeleteKeyDoesNotExist { public_key: PublicKey },
    AddKeyAlreadyExists { public_key: PublicKey },
    DeleteAccountStaking,
    DeleteAccountHasRent { amount: Balance },
    TriesToUnstake,
    TriesToStake { stake: Balance, staked: Balance, amount: Balance },
    FunctionCall(VMError),
}

MaksymZavershynskyi commented 4 years ago

We should not duplicate in the errors the information that is already available in the transaction itself, so refactoring InvalidTxErrorKind makes sense, but separating InvalidTxErrorKind from InvalidTxError does not. @evgenykuzyakov do we merkelize errors or plan to merkelize them? If yes, then I think we should avoid duplicating transaction info in errors. However, not including transaction info into errors would require the Applayer to merge the error struct with the transaction info to get the full picture. @frol @vgrichina WDYT?
Using anonymous structs instead of tuples in enum variants is a good idea, because it is more self-explanatory;

lexfrl commented 4 years ago

but separating InvalidTxErrorKind from InvalidTxError does not.

If yes, then I think we should avoid duplicating transaction info in errors. However, not including transaction info into errors would require the Applayer to merge the error struct with the transaction info to get the full picture.

overall you're don't think it's a good idea?

What I was thinking about is to include TX identifier (which is tx.public_key+tx.nonce - is it correct? or maybe we should use tx.hash for?) to allow frontend to match an error with a sent transaction. This especially helps with an async protocol, where you don't have a correspondence b/w request and response.

Also I wanted to add an Action index to add an ability to identify with action has been failed.

lexfrl commented 4 years ago

. Every element in this error message is an enum of an enum variant, while GasExceeded can contain some parameters like the amount of prepaid gas. Now imagine, we wanted to also know which method was called, in terms of Rust structures I think it is logical to have this parameter in FunctionCallError instead of HostError or GasExceeded types, which can be done by extending the variant ActionError::FunctionCallError like this:

ExecutionError::Action::ActionError::FunctionCallError::FunctionCallError::HostError::HostError::GasExceeded

I want to simplify the format. I don't think client should care about the inner error structure (which could change in time also).

Take a look on the proposal:

Client can save the transaction (in Map<txhash, Transaction>), so the TX info is available already - server could just return the following:

{
   txhash: string, // an id which we'll use to refer the TX which was sent
   outcome: {
      logs: [string],
      result: bytes | null,
      // ....
   error_action_index: number | null, // if error is action error - the index of action in the TX where this error was happened
   error: Error | null, // just the actual error, no nesting
   trace: [trace_entry] - // we would want to add a structured trace functionality in future (instead of the long error names (I think long names are ok, but only for identification))
}

Having both TX data + structured error client can have a function format_error(tx, error) which will return the error message.

frol commented 4 years ago

However, not including transaction info into errors would require the Applayer to merge the error struct with the transaction info to get the full picture.

It sounds completely sane and fine to me. I would even say that I seem to prefer this clear separation.

This especially helps with an async protocol

I believe that this should be tracked on the protocol level instead of the "application" level. Thus, this argument for having extra info does not sound right to me.

I am a bit lost here with the general picture, but I seem to generally agree with everything Max proposed.

vgrichina commented 4 years ago

I agree with @fckt that it's good idea to include TX identifier in error. Also I see no problem with duplicating info available in transaction. Providing all relevant info in the error itself improves developer experience on receiving side. It's quite likely that transaction object isn't available at the moment error is processed in Web context (e.g. error is passed through URL redirect by wallet).

MaksymZavershynskyi commented 4 years ago

Yeah, I think we should stop iterating on the design and focus everything on implementation. It does not seem like people have strong opinions against the design and we've been discussing it for a long time now and incorporated everyone's input. So we should freeze the design @fckt which is we only return one error object serialized with JSON from the node.

MaksymZavershynskyi commented 4 years ago

It's quite likely that transaction object isn't available at the moment error is processed in Web context (e.g. error is passed through URL redirect by wallet).

@vgrichina We won't be able to include some transaction info into the error anyway, e.g. if we there an error with deploying contract we won't attach the entire contract code to the error. So this is not going to solve the problem entirely, so whoever designs front-end would need to workaround it by joining transaction info with error anyway.

lexfrl commented 4 years ago

@vgrichina @nearmax We could add a txhash as part of the overall result later, separately from an error object.

lexfrl commented 4 years ago

The backend refactoring is almost done. I spent some time to figure how the error handling would look like on the client side. Take a look:

class ServerError {}

class TimeoutError extends ServerError { }

class Closed extends ServerError { }

class TxError extends ServerError { }

class InvalidAccessKeyError extends TxError { }

class AccessKeyNotFound extends InvalidAccessKeyError {

    account_id?: string;
    public_key?: string;
}

class ActionError extends TxError {
    index?: number;
}

class AccountAlreadyExistsError extends ActionError {
    account_id?: string;
}

class MethodNameMismatch extends ActionError {
    method_name?: string;
}

function request() {
    let e = new AccountAlreadyExistsError();
    e.account_id = "test";
    e.index = 1;
    throw e;
}

function method() {
    try {
        request();
    } catch (e) {
        if (e instanceof ActionError) {
            console.log(e);
        }
    }
    return undefined
}

console.log(method());

The idea is to generate these classes from the Rust types. Also it generate a function which takes a result and returns the instance of the matched class, e.g.:

let result = {"TxError": {"ActionError": {"index": 1, "kind": "{"AccountAlreadyExistsError": {"account_id": "bob"}}}}

So the generated code will go through the object and return the following:

e = new AccountAlreadyExistsError();
e.index = 1;
e.account_id = "bob"

What do you think @nearmax @vgrichina @frol ?

frol commented 4 years ago

@fckt This looks so much better! Great work!

lexfrl commented 4 years ago

Of course Rust type-system is different. For example, we should prevent naming collisions and fields collision in the Rust code.

lexfrl commented 4 years ago

One problem now with the Rust->Typescript generation is that we should maintain our own mapping of Rust primitives -> JS primitives which is done by serde now automatically. I'm looking into https://github.com/GREsau/schemars to figure out if we can use it to generate schema and then generate TS types over it + Error matching on JS side.

frol commented 4 years ago

I suggest we do schemas in another iteration. Let's merge the better error messages first. That would be enough to fix the most common papercuts.

MaksymZavershynskyi commented 4 years ago

Moving this into documentation pipeline adding @amgando to write documentation together with @fckt. Adding @amgando to Assignees;
Changing priority to P1 since there is no urgency for this documentation.

amgando commented 4 years ago

@nearmax seems like i would need to wait for this refactoring to surface in nearlib via https://github.com/nearprotocol/near-wallet/issues/289 before documenting

is that the correct assumption or have i missed some other work i must do here?

ilblackdragon commented 4 years ago

@amgando Wallet issues should not be affecting nearlib . As far as I understand, the errors are already in the nearlib .

I'm assuming we need doc strings with possible errors from RPCs in nearlib or what is actually required here on docs side @nearmax ?

MaksymZavershynskyi commented 4 years ago

@ilblackdragon We already have doc strings in the nearlib. The purpose of this work item to make sure we have in depth explanation on them in some documentation.

Moving it to the Ice Box and removing from the Mainnet release since it is not critical for the Mainnet.

vgrichina commented 4 years ago

@ilblackdragon @nearmax note that this is not merely a documentation issue.

Structured errors work is half assed and it's hard to know what to expect.

Example of error from nearcore:

    response.error {
      code: -32000,
      message: 'Server error',
      data: 'Other Error: block DtdfWgnt3QDEjoeARUK25vg8qh31kc3Lrg6qREsoJsST is ahead of head block 34eT8dRWfDPhZBiRGYvkHDj65ThaXCBTrnM4LQjn2Zab \n' +
        ' Cause: Unknown \n' +
        ' Backtrace:    0: failure::backtrace::Backtrace::new\n' +
        '   1: <actix::address::envelope::SyncEnvelopeProxy<A,M> as actix::address::envelope::EnvelopeProxy>::handle\n' +
        '   2: <actix::contextimpl::ContextFut<A,C> as core::future::future::Future>::poll\n' +
        '   3: tokio::runtime::task::raw::poll\n' +
        '   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll\n' +
        '   5: actix_rt::runtime::Runtime::block_on\n' +
        '   6: near::main\n' +
        '   7: std::rt::lang_start_internal::{{closure}}::{{closure}}\n' +
        '             at rustc/a74d1862d4d87a56244958416fd05976c58ca1a8/src/libstd/rt.rs:52\n' +
        '      std::sys_common::backtrace::__rust_begin_short_backtrace\n' +
        '             at rustc/a74d1862d4d87a56244958416fd05976c58ca1a8/src/libstd/sys_common/backtrace.rs:130\n' +
        '   8: main\n' +
        '   9: __libc_start_main\n' +
        '  10: _start\n'
    }

Note that even actual error message Other Error: block DtdfWgnt3QDEjoeARUK25vg8qh31kc3Lrg6qREsoJsST is ahead of head block 34eT8dRWfDPhZBiRGYvkHDj65ThaXCBTrnM4LQjn2Zab doesn't go into proper place in message.

vgrichina commented 4 years ago

Changed priority to P0 as this thing wasn't implemented properly and errors cannot be handled well on near-api-js side.

vgrichina commented 4 years ago

It is really sad that it's not done properly in more than half a year since issue been created, but on pragmatic side I'd like to at least understand what is current state (so that near-api-js can workaround for critical issues like timeouts and nonce collisions):

when and where structured errors can be expected? Judging by @fckt changes to near-api-js – there are at least 2 different places where structured errors can be found
what are other possible formats of errors from RPC and when/where they are used?

vgrichina commented 4 years ago

Looks like there seems to be consistent error structure for timeouts. Can anyone confirm this is the case?

      response {
        jsonrpc: '2.0',
        error: { code: -32000, message: 'Server error', data: 'Timeout' },
        id: 148
      }

Do I understand correctly that's Timeout here is how enum value gets serialized if it doesn't have any fields as well?

lexfrl commented 4 years ago

This specific issue is all about InvalidTxError and ActionError. Other cases are out the scope of this issue (at least initially).

bowenwang1996 commented 4 years ago

@vgrichina yes that is the case

vgrichina commented 4 years ago

This specific issue is all about InvalidTxError and ActionError. Other cases are out the scope of this issue (at least initially).

I assume you mean implementation only has considered these errors, correct? I don't see anywhere in issue description that it should be the case. It talks about RPC errors in general:

As discussed with @vgrichina @evgenykuzyakov @frol @fckt @janedegtiareva , we want the node RPC to return structured error messages

MaksymZavershynskyi commented 4 years ago

Closing this issue, since the refactoring was done as you can see from checkmarks. Please file individual issues for bugs found within our structured RPC or errors wrapped by near-api-js.

near / nearcore