Node API Overhaul

Last updated: Monday, October 12th, 2020.

Overview

The API layer of a Harmony node is its main interface with the outside world. Therefore, it must be consistent, robust, and well-documented in order to provide easy integration with developers who wish to build on Harmony.

Current issues with v1 & v2 of Harmony's JSON-RPC based API

Inconsistancies of response types (hex v.s. hex string v.s. number) and names (cammelCase v.s. kebab-case v.s. snake_case).
Adding 1 RPC call consists of 5-7 files to edit and often contains duplicate code.
Silent crashes and no logs of any error during a failed procedure calls.
RPC's Http server occasionally fails to start on node boot.
Documentation is not part of the dev cycle and must be manually created afterward.
Testing is not part of the dev cycle.

Context

For Harmony to grow and get external contributors, we must provide a professional interface for developers to hook into. Moreover, an easily consumable Node API can attract developers to take a deeper dive into Harmony's codebase and contribute to the core protocol.

Since Harmony has completed open staking, all core functionalities for an API layer have been defined. Therefore, now is an opportune time to correct the issues stated above as we are in the design phase of other major core features.

It is important to note that Harmony has numerous tools and partners that use the current JSON-RPC based API. Therefore, we must be careful to NOT break/change the existing RPC output format during the overhaul.

Goals and Non-Goals

What this overhaul aims to do:

Create a 'sanity-check' test suite that ensures the format & functionality of the current JSON-RPC.
Clean-up the existing ./internal/hmyapi packages to reduce code duplication & dead code.
Create an additional API layer that provides consistency in all responses and retains all functionality from the v1 & v2 JSON-RPC services.
Integrate documentation as part of the API dev cycle.
Integrate testing as part of the API dev cycle.
Provide error reports of call crashes in the existing & new API layer (low priority).

What this overhaul aims NOT to do:

Fix race conditions of the underlying function calls.
Break old v1 & v2 JSON-RPC functionalities.
Wrap around the existing v1 & v2 JSON-RPC functions to create a new API.

Existing Solution

Harmony uses JSON-RPC 2.0 for its node API (carried over from ethereum).

At a high level, the Harmony node starts an Http server (attached to some given port) for the RPCs and gives it a set of methods (referenced as API in the code) that are offered by the RPC interface.

Harmony uses the geth RPC package found here to handle the RPC's Http server. Said server gets started by the node on boot (here) by ultimately calling the geth function here. The APIs consumed by said function call is defined (here) specifically for the Harmony Node.

The 'common' backend interface, which connects the RPC layer with the node, can be found in ./internal/hmyapi/backend.go. The implementation of the interface that is used by all Harmony nodes can be found at ./hmy/api_backend.go as the APIBackend object, which just wraps around the Harmony object with a cache. Note that the Harmony object is not used in a meaningful way other than just a container of major data structures.

Harmony currently supports 2 RPC method namespaces, hmy & hmyv2. The supposed 'only' difference between the two is that hmyv2 returns values in decimal, whereas hmy returns values in hex. However, there are numerous occasions where hmyv2 is missing a method that hmy has and visa-vera. Moreover, there are various methods that have slightly different signatures, causing further confusion. The primary cause of these issues is that there are 2 distinct copies of each namespace's service and respective methods, which can be found here and here. Because of this code duplication, one has to define a new backend interface in 3 different places and implement it in 2 different places to properly add a new RPC. Couple this with the fact that we don't have any check for consistency between the namespaces, and it becomes easy to see why we have the issues mentioned above.

Lastly, Harmony's RPCs often returns the JSON text marshalled version of various data structures. However, we have inconsistent key naming conventions for said structures.

Proposed Solution

At a high level, we will first fix the existing JSON-RPC methods (focusing on ./internal/hmyapi), then we will implement the Rosetta node API.

The overhaul will have 7 stages: 1) Lock the current RPC behavior with tests. 2) Refactor JSON-RPC helper functions & method definitions. 3) Implement a new node API (Rosetta). 4) Docker node deployment. 5) Integrate Rosetta's testing tool into our CI. 6) Extend our implementation of Rosetta's node API for staking functions. 7) Setup API documentation generation.

1. Lock the current RPC behavior with tests

Similar to our existing PR testing script, we will orchestrate a series of transactions on a localnet and do RPC calls to assert the format and values of the output. Quite some work has been done here using pytest, and we shall follow/extend this format for the tests. We can also add more go-tests as we see fit.

Moreover, we will script the deployment of a localnet and execution of the test script within a (public) dockerfile & image.

A benefit of doing this first is that we are able to ensure that our existing tools won't break with any subsequent changes. Moreover, we should be able to replace the existing PR testing Jenkins job for a simple docker run command that we can integrate with our TravisCI.

We shall create a repo for all PR testing scripts & docker files for clarity.

2. Refactor JSON-RPC helper functions & method definitions

The goal of the refactor is to remove duplicate code, dead code, and needless abstraction.

2.1 Remove the `Backend` interface

Since all RPC methods use a Backend, we can start there to reduce the needless abstraction. Note that we only have 1 implementation of this Backend interface with no plans (or need) to create more implementations. Therefore, we should remove the interface and use the Harmony object as the interface between the Node and API/RPC layer (for reasons described in 2.2).

2.2 Fuse `APIBackend` and `Harmony` together

As stated in the section above, APIBackend just wraps around the Harmony object. Therefore, we can just define all interfaces between the Node and API/RPC layer within the Harmony object.

2.3 Generalize code for the 2 RPC namespaces

We have heavy code duplication at the method definition for each RPC namespace here (apiv1 & apiv2). We should undo the code duplication introduced earlier this year and just carry a flag within each API struct (for example PublicBlockChainAPI) to indicate the RPC namespace. Then, the methods can adjust the output format (hex or decimal) according to this flag.

Since there are some inconsistencies between the signatures of each namespace, we should change the v2 signature to be the same as the v1 signature if we cannot find a suitable fix. Justification for this is that apiv1 has been out for longer and more tools (from Harmony & partners) are built on top of apiv1.

2.4 Remove dead code

There are numerous things leftover from ethereum that Harmony has not implemented (such as addrlock) that can be removed. We should take this time to check & remove this code.

2.5 Rename JSON text marshall key names This task will get done at a later time (TBD). Reason in the footnote for 7/24.

As mentioned in the issues section, we have inconsistent key naming conventions for the responses of our JSON-RPCs. This is partially due to our inconsistent naming conventions for the JSON text marshall of our data structures. As these text marshalls will be used in the implementation of the additional node API, we should make the naming convention consistent with 'snake_case'. However, we CANNOT break the output format of our current RPCs. Therefore, we will add a 'reformatter' of sorts (when necessary) to keep the current output format.

3. Implement a new node API (Rosetta).

The goal of this overhaul is to provide a consistent, robust, and well-documented API. Rosetta's node API provides a low-enough level API specified using OpenAPI 3.0 and works well for Harmony's needs. Moreover, Rosetta provides a SDK along with tooling and tests in their rosetta-cli that will speed up implementation and ensure correctness. We see Rosetta as a standard that other chains will follow, therefore it would be advantageous for us to use this API specification as it will be easier to integrate with future partners and developers.

Rosetta does not provide a spec for staking related functionalities. However, we can extend our implementation of Rosetta's node API for staking commands (done at a later stage).

Note that the implementation of this node API can hook into the Node's 'backend' interface cleanly with the refactor from stage 2. This will greatly speed up the implementation time of this stage.

4. Docker node deployment.

To satisfy the Rosetta node API expectation, we must provide a dockerfile for running a Harmony node. Expectation is outlined here. Note that the dockerfile must build the node and run it (with certain storage restrictions). Fortunately, we have some work done in our dockerfile here that we leverage along with the dockerfile created in stage 1.

Once this stage is done, we would have a complete implementation of Rosetta's node API.

5. Integrate Rosetta's testing tool into our CI.

Rosetta provides tests for the node API via the Rosetta CLI (as seen here). We will integrate said tests into our CI test, either as a stage in a Jenkins job, or as an additional step in the new test suite (stage 1).

6. Extend our implementation of Rosetta's node API for staking functions This is no longer needed with the `/call` endpoint added in the Rosetta specification.

As mentioned in stage 3, Rosetta's node API does not provide a spec for staking related functions. We can integrate staking transactions by defining these transaction types as a supported operation. However, fetching staking related network information will be done by extending our implementation of the spec via an additional /staking endpoint. This way, we don't add fields or overload existing fields in the Rosetta spec, thus making our implementation more upgrade friendly for future versions of Rosetta.

Note that the Rosetta spec was generated using Open API 3.0, therefore we can extend this spec by defining the staking endpoint APIs (example here) and generating the necessary code to implement it (example here). Our intention is to stay as close as we can to the design direction of Rosetta yet remain independent enough for future upgrades of Rosetta's node API.

A separate, design doc will be created when we get to this stage and will be linked here (not specified yet).

With a new set of staking endpoints in this API extension, we must create our own set of tests to integrate into our CI. We shall add a test suite to the pytest from stage 1 since it will more 'future proof' than duplicating & extending the checks done with the Rosetta CLI.

7. Setup API documentation generation.

Since the Rosetta node API is specified using Open API 3.0, we can use existing tools like Swagger UI and Swagger Inspector to generate documentation (with sample requests & responses) from the API specification. Example here. The sample calls can be done on a 'documentation' node to provide consistent responses.

Milestones

Each milestone will have a pull request associated with it.

Start Date: July 9th, 2020

PR	Discription	Target Date
Stage 1	Main repo PR RPC test suite + CI setup	DONE
Stage 2.1	'Backend' RPC/API structure refactor (2.1, 2.2)	DONE
Stage 2.2	Remove duplicate code & remove dead code (2.3, 2.4, & ~~2.5~~)	DONE
Stage 3.1	Rosetta server on node boot + rosetta service router	DONE
Stage 3.2	Rosetta's `/network` & `/block` node endpoint services	DONE
Stage 3.3	Rosetta's `/account` & `/mempool` node endpoint services	DONE
Stage 3.4	Rosetta's `/construction` node endpoint services	DONE
Stage 4	Dockerfile for node build & deployment (Rosetta's node API fully implemented)	DONE
Stage 5	Integrate Rosetta CLI check tool into our CI	DONE

Alternative Solutions

JSON-RPC v3 instead of Rosetta node API

This solution would be to just create another JSON-RPC version that fixes all of the aforementioned issues (after the refactor in stage 2). This would be a simpler approach and would be faster to implement. However, we would like to support a standard that would help partners & developers onboard faster and we see Rosetta's node API spec as the best solution for Harmony to achieve this.

That said, the design of this overhaul should allow for easy implementation of a v3 JSON-RPC in the future if we feel like we need it.

Abandon v2 JSON-RPC

The solution here would be to drop the v2 RPC to simplify stage 2 of the implementation. This would speed up the implementation time, however, we have to consider key partners and developers that currently use v2.

As we have had the v2 RPC for some time now, we should continue to support it. Furthermore, it is likely that the time saved in the implementation now would be repaid later in extra time spent on coordination, support, and patching our existing tools.

Skip 'backend' refactor and go stright to Rosetta node API implementation

While this is possible to do, it will likely introduce further code duplication, which may lead to inconsistencies between the JSON-RPCs and the node API. Also, we need to make our codebase more consumable for external developers, and it is likely that the RPC/API layer will be their introduction to it.

Furthermore, we still need to support the JSON-RPCs for the foreseeable future and will likely add RPCs or functionality to the new node API in the future. However, it is currently very cumbersome to add such things and plenty of dev time is wasted in the process. Therefore, it is worth it to take the time now to refactor the backend.

Overload or add return elements to the new node API replies instead of extra `/staking` endpoint

We want the extensions that we make to Rosetta's node API to be as independent as possible so that it would be easy to upgrade our implementation of the API in the event that the node API spec receives an update. Moreover, this simplifies the implementation of staking functionalities as we do not have to worry about violating the expected returns of the node API.

External Impact

There is a possibility that partners may suffer some downtime when refactoring the RPC's 'backend'. However, extra time has been allocated to define thorough tests that would minimize this possibility. Moreover, we shall do a testnet deployment first and encourage partners (with the help of the bis team) to test their functionality on our testnet. We will also ask P-OPS to test.

Current Scoping & Timeline

This is where updates will be posted regarding current progress & hurdles.

Current Stage: 3

7/9: [Stage 1] Localnet deployment needed to be redone to include external nodes as part of the testing process. Changes are done here. Created a new repo that will contain the tests for stage 1. 7/10: [Stage 1] Finished testing pipeline here. Localnet is built, deployed, and tested in a docker image using a shared volume with the main repo. This allows one to test whatever changes they made locally, allowing for fast iteration in the later stages. 7/11-7/18: [Stage 1] Test cases took a lot of time to write as the 2 different RPC versions had differences that required careful tests. Moreover, some time was spend optimizing the tests to be ran as fast as possible since it will be used a lot in the future sections & by other devs as we implement other initiatives. PR of test merge can be found here. 7/24: [Stage 2] There is a lot of technical debt and duplicate code that needs to be refactored/fixed, even within each RPC version. Therefore, as I join the 2 RPC versions back together (thus removing a large chunk of the duplicate code), I will also fix the in-package code duplication (or at least contain it to be fixed at a later time). Moreover, I will also separate services for clarity (now everything is dumped into 2 files, transactionpool.go & blockchain.go). This comes at a cost of time, but I find it to be much more valuable than correcting the naming conventions of data structures at this time. Also, the data structure correction can be done separately after stage 2 by another engineer if available. 8/11: [Stage 3] Due to the 10 day #DeFi hackathon at Harmony, momentum & context has been lost. This has put a substantial delay in the planned timeline. Moreover, familial commitments towards the beginning of September will slow down dev time (a bit). Lastly, the construction spec has had additions to it since the initial planning pass. The additions will require a little bit more time than anticipated to implement the endpoint. That said, Rosetta's forums are more active nowadays, which should hopefully help us get over hurdles faster should we encounter any. 8/13: [Stage 3] During the PR process of #3297, there was some regression on the CPU usage (evident in the localnet test). This slowed down progress as time was spent dealing with the issue. Timelines have been updated to reflect this. 8/25: [Stage 3] While the PR (#3312) for stage 3.2 got done in time, there were some edge case bugs that were found when testing with the mainnet DB. Therefore, the review process took longer than normal. Delay for this stage was only 1 day, hopefully this can be made up at the later stages. 8/27: [Stage 3] Some decimal precision errors occurred during further testing of staking operations from the /block/transaction endpoint. Moreover, some block rewards were not visible. The following PR #3321 fixes these issues. All effort was put towards fixing said bugs, thus no progress has been made on the account, mempool, or construction endpoints. Some delay is expected at this point -- but should be minimal. 9/7: [Stage 3] Due to the bridge hackathon (where some efforts were made to research multichain) and some personal time off, progress for rosetta was halted. However, some lessons learned from the multichain research (particularly involving signing txs and Docker) will reduce the time spent on the Construction API and the Docker requirements. The target date has been updated (generously) to reflect the delay.

harmony-one / harmony