harmony-one / bounties

Bounty program is to help the community take part in the development of the Harmony blockchain. It covers from core feature to validator tooling, from dApp development to DeFi integration.
MIT License
59 stars 23 forks source link

draft - rpc end point new db backend #87

Open LeoHChen opened 2 years ago

LeoHChen commented 2 years ago

Description

Build a new RPC endpoint backend service, using PostgreSQL as DB backend to serve all RPC calls.

Context

The backend of the RPC endpoint in Harmony is running an explorer node and serves RPC requests to end-users. It has to sync the blockchain and serves many RPC requests at the same time. When there are many RPC requests that consumed CPU/memory/network, the explorer node usually encounters the out-of-sync problem, thus can't serve RPC requests anymore. The DB engine of the explorer node is LDB, which has some limitations to support elastic web service, such as Only a single process (possibly multi-threaded) can access a particular database at a time.. Thus, even though we can have multiple nodes behind the RPC endpoint, each node has to have its own LDB. It is not elastic if we need to support a surge of RPC requests as to spin up new nodes will take a huge amount of time to sync the DB.

So, to build a scalable web service to serve all the RPC requests, an elastic architecture is needed. In general, one DB backend can be separated out to serve multiple compute units which can be scaled up/down using many existing frameworks, such as the auto-scaling group in AWS.

Reference

Harmony explorer v2 is using a PostgreSQL DB as the backend. There is a dedicated indexer to index the blockchain DB and save the data to PostgreSQL. Explorer backend will read from the DB and serve the frontend. This solution solved the scalability issue of the explorer and can be used as a reference for the new architecture of the RPC service.

Challenges

This bounty is asking for a new design of the RPC endpoint, including a new indexer, DB schema, RPC backend service. One challenge of this bounty requires bounty hunter has an in-depth understanding of the blockchain DB format, block records, transaction format, etc. Another challenge is most of the RPC calls are DoEVMCalls, which may not be easy to parallelize using one single DB backend. Those challenges need in-depth analysis and research of the system.

I would ask the bounty hunter to take a staged approach to do enough research before the implementation. Also, please work with the team closely to clarify the DB format, and be prepared to dive deeper into the EVM calls.

Acceptance Criteria

Reference

Repo of the existing explorer v2 backend/indexer. https://github.com/harmony-one/explorer-v2-backend

Reward

USD $45,000 in Harmony ONE token

JackyWYX commented 2 years ago

One very important technical difficulty lies in the execution of EVM machine. I wonder whether there is any existing solutions that support trie based data query for relational database.

rlan35 commented 2 years ago

it's in the right direction, but I think the scope will be very big. It's basically building infura's internal secret infrastructure that's used to scale their api service. Some challenges:

  1. evm calls are dynamic on-the-fly query, no easy way to index beforehand
  2. how to handle event and receipt websocket streaming in real-time without much delay.
  3. to make the system really scalable. everything should be put into DB and indexed and the serving node should be stateless. But if the DoEVMCall can not be indexed (which is the most costly calls). Then the whole purpose of this system will be degraded.

One compromise is to only apply this system to the easy-to-index data such as block and txns raw data. And having a separate serving stacks for this data, while keeping the normal node to serve the EVM calls as currently. This will reduce the load on the existing nodes. Then the question is how much performance gain we can have because those easy-to-index data is also very cheap to serve.