go-spacemesh API implementation

Overview

We have a robust data model design (#13), and an API design based on it (https://github.com/spacemeshos/api/). We need go-spacemesh to expose the data in the API to several classes of clients, including:

App / Wallet
Dashboard backend
Explorer backend
Backup agent (backup testnet data in a restorable way)
Marketing programs reports. e.g. rewards program top 100

Goals and Motivation

Where possible, reuse or extend existing code (collector/events/pubsub)
Minimize performance impact on go-spacemesh, including both compute and memory
Minimize new code and required architectural changes
Make it as easy as possible to reimplement the same API in future clients
Have an extensible, flexible API that we can continue to build on in the future
Avoid needing to write and maintain individual SDKs or connectors per client platform (if possible)
Support several types of IPC: in process, UNIX socket, RPC, RESTful JSON. REST support is important so that we can support "real clients" such as web and mobile apps directly without requiring middleware.
Make it as easy as possible for "heavy clients" (e.g., explorer backend/middleware) to suck all data out of a node.
Support streaming of real-time data, e.g., newly-created blocks
Support load balancing and encryption
Avoid reimplementing the API, which is already fully specified (if possible)
Other things being equal, we'd like to minimize message sizes (this is low priority for the API)

Design

Benefits of gRPC

gRPC allows us to achieve all of these goals, in addition to having the following other niceties:

we get serialization/deserialization of our types for free; no need to use JSON
clients receive natively typed objects rather than free-form JSON
grpc-gateway provides automatic, transparent conversion to REST-JSON (we're already using this)
streaming is pretty cool
we get auto-generation of SDKs for many languages - no need to build our own (“the ability to move from publishing APIs and asking developers to integrate with them, to releasing SDKs and asking developers to copy-paste example code written in their language”)
we can easily spec endpoints; this spec is shared between client and server, and among many client and server implementations; allows the succinct definition of endpoint contracts
native support in NGINX would make load balancing and TLS pretty easy (https://www.nginx.com/blog/nginx-1-13-10-grpc/). Native TLS support is very nice to have.
supports many platforms, and the list is growing; very active development
gRPC supports a rich error model
we're already using it in go-spacemesh

Downsides to/limitations of gRPC

By default gRPC has a maximum message size limit of 4mb, but this can be increased pretty easily. (We already ran into this once.) I don't foresee any major design or implementation challenges as a result of this.

In theory, the RPC design pattern requires tighter coupling between the client and the server than pubsub (which is very loosely coupled: the publisher doesn't even need to know of the subscriber's existence). In practice I don't think this will be an issue for us since I think we can use the existing pubsub-based events framework for all events, which can be delivered to pubsub subscribers and/or to the API streams transparently.

Proposed Implementation

replace the existing gRPC API in go-spacemesh with the new, complete API specification.
integrate the API, and in particular, the API streams, with the existing events framework, so that events are sent on both the existing pubsub bus as well as on the new API streams

gRPC vs. existing pubsub framework

pubsub is a low-level message-passing protocol that allows a set of events, such as "new block", "block valid", "new ATX", "reward received", "created block", etc., to be broadcast to any number of subscribers. It's currently being used in a multi node test that allows many node instances to share data very rapidly in order to simulate a network in fast-forward. It can hypothetically be used to pass these same events to downstream clients such as for analytics purposes, or a block explorer.

However, being a low-level protocol, pubsub is missing a lot of the features that we get for free with gRPC, so this sort of use case would require considerable additional effort: developing SDKs/connectors for the clients, handling type conversions, clearly defining the protocol, load balancing, and encryption. Also, pubsub would not support certain required use cases well, such as web/mobile clients.

Finally, many of our API endpoints are in fact remote procedure calls: they take arguments, cause the backend to perform some action, and return some value. This use case is not natively supported in pubsub.

We can pretty easily implement and emulate all of the features of pubsub using gRPC streams—in fact, this design work is already done.

Implementation plan

See https://github.com/spacemeshos/go-spacemesh/issues/1764

Dependencies and Interactions

Dependencies:

data provision from many existing go-spacemesh packages
existing go-spacemesh events/collector framework
existing go-spacemesh API/grpc server framework
existing go-spacemesh grpc-gateway server framework
https://github.com/spacemeshos/api/

Interactions:

Existing event/collector/pubsub code
Other code that triggers new events as defined in the API (not already covered by the existing events framework)
Many downstream clients - this is more relevant for the API design (#13) than for implementation

Stakeholders and Reviewers

go-spacemesh API/events/collector/pubsub code: @antonlerner
spacemesh app (API consumer): @IlyaVi
API for analytics/telemetry: @ilans
contractor team working on explorer/dashboard backend (one of the first API consumers)

Testing and Performance

Testing: Existing API tests will be rewritten, and expanded, to work with the new API code. New tests will be written for any new functionality added, e.g., grpc streams.

Performance: We may want to do some profiling/performance/stress tests to make sure that the new API code, especially events/streams, does not have a negative performance impact on go-spacemesh. Per @antonlerner, we should also test how many simultaneous connections grpc supports.

I think we should separate two purposes of the API, first, is to actually provide an API to Node functions such as GetBalance, ChangeRewardAddress... etc... This I agree that could be changed to use GRPC API.

The other functionality that is discussed here is the "Events" functionality. this will probably not be used by end users which want to connect with the Node, and will serve building apps such as block explorer and dashboard. IMO this is a separate requirement and should be designed a bit different. I think it is best to implement the events framework using the pubsub because it is more robust and will allow more flexibility than GRPC streams in terms of subscribing and un subscribing to different topics on the fly. The pubsub can be used internally in the same process as well as externally from other processes, @ilans has mentioned he wishes to incorporate such framework into the node anyways. Also, as I understand there will be a stream in the API that will serialise either all or several events into one stream, on the development side, this IMO, will require more development effort in the sense that now all these events must be serialised from different parts of the code again. Last, i think it's important to see how many simultaneous connections the GRPC steream can support and what are the software bottlenecks of this solution and compare them to current implementation

Having said that, I'd be happy to discuss and see if we could use streams and get same level of robustness and flexibility using GRPC streams, if you indeed think it is better to implement events that way @lrettig @avive

Also note that as part of this change, I think we should also address the local testnet events issue Currently, the local testnet relies on logs to monitor the network and print network status, this can and should be changed to get data using the correct data endpoint

@antonlerner thanks for taking a look and for the thorough reply! Your timing is great :) I'm working on the non-stream API endpoints for now, and haven't begun implementation of the streams yet.

To respond to a few of your points:

The other functionality that is discussed here is the "Events" functionality. this will probably not be used by end users which want to connect with the Node, and will serve building apps such as block explorer and dashboard. IMO this is a separate requirement and should be designed a bit different.

While I agree there's an important distinction between "one-off" endpoints and the streams, I'm not entirely sure the streams won't be used by end users--e.g., I'm pretty sure that @avive and @IlyaVi plan to subscribe to events in the wallet and to use this to display account-related events to the user, e.g., incoming transactions, rewards, etc. This may be cleaner and easier than polling the node from a design perspective. I think @avive has stronger thoughts on this.

I think it is best to implement the events framework using the pubsub because it is more robust and will allow more flexibility than GRPC streams in terms of subscribing and un subscribing to different topics on the fly.

Curious to hear more about why you feel that pubsub is more robust and makes it easier to subscribe and unsubscribe from different topics.

Also, AFAICT the two are not necessarily mutually exclusive - I think we could have the same set of events exposed using the existing pubsub framework, or grpc streams, or both (modulo questions about serialization and multiplexing, as you point out). I haven't gotten deep enough into the implementation yet to know with confidence.

it's important to see how many simultaneous connections the GRPC stream can support and what are the software bottlenecks of this solution and compare them to current implementation

Totally agree. Would appreciate your advice on how to test these!

Currently, the local testnet relies on logs to monitor the network and print network status, this can and should be changed to get data using the correct data endpoint

Another good point - would love to hear thoughts from @ilans on this. Does the API design as it stands contain the correct endpoints? And is there a preferred protocol for consuming these data?

While I agree there's an important distinction between "one-off" endpoints and the streams, I'm not entirely sure the streams won't be used by end users--e.g., I'm pretty sure that @avive and @IlyaVi plan to subscribe to events in the wallet and to use this to display account-related events to the user, e.g., incoming transactions, rewards, etc. This may be cleaner and easier than polling the node from a design perspective. I think @avive has stronger thoughts on this.

It all depends on how it's implemented, if events are raised only as part of the node running, in order for you to get all data you'd need to restart the node and sync from genesis.

Curious to hear more about why you feel that pubsub is more robust and makes it easier to subscribe and unsubscribe from different topics.

As I understand it, the GRPC stream will give you all the data in the mesh in a single stream, so you can't have only part of the data without deserialising it first. the other way around it is to create another endpoint for each datatype and / or identity type (i.e account, node etc...) pubsub can make it more robust in terms that each of the identities and data types can be made a topic and allow much flexible querying and filtering

Also, AFAICT the two are not necessarily mutually exclusive - I think we could have the same set of events exposed using the existing pubsub framework, or grpc streams, or both (modulo questions about serialization and multiplexing, as you point out). I haven't gotten deep enough into the implementation yet to know with confidence.

Can we map out all uses for the streams we know of? this will help us understand how many endpoints we will need to support and whats the equivalent effort (topic selection) we would need to have on our pubsub. I think this will also help us choose better between the two.

Totally agree. Would appreciate your advice on how to test these!

we can read the grpc stream code... also, another advantage of mapping the required endpoints will tell us how many streams will be simultaneously opened and active when querying the node.

We have designed the api around services and 3 clients and did the big code review of the api around those services - e.g. node, mesh, global-state, transactions service and we've implemented all review suggestions. I feel that we have a good design and I see little reason to separate facets differently. Everything is mapped out in the current grpc service definitions of these services so I don't understand the ask for mapping things out. All clients use different kind of methods to get what they want - current data, streams for future data and queries for historical data. The wallets definitely need streams so they can stop polling the node on loop like what is done today in smapp which is bad and very wasteful. Also, streams do not give all the data in the mesh in a single stream - what they return depends on what they were defined to return based on the user input filters.

how will one subscribe to new data from the stream? also, was this review done with @ilans? He also wants to have certain probes inside nodes and receive some data/events from

Quick update here, I've begun implementing streams (https://github.com/spacemeshos/go-spacemesh/pull/2061). I created a new singleton struct that basically just stores a list of channels, one per data type that we care about. In the places where events.Publish() is now being called, I'm adding a second call to publish the data element onto the appropriate channel. The grpc endpoint backends listen to the channels they care about (which can be specified by a Filter that's passed in).

I considered integrating into the existing events/pubsub framework but did not for several reasons:

it seems odd to pass this data via TCP when it can more easily be communicated in-process
it seems wasteful to serialize/deserialize the data when, again, they can be passed in-process via go channels
for this reason, rather than having a single Event interface, since I'm not serializing everything down to byte streams, I'm just keeping it simple and using a separate channel/stream function for each data type
only a portion of the data is communicated via pubsub, not the entire data object (e.g., the ATX does not contain all of its fields)

To be clear, I'm talking specifically about how the API backend is implemented internally, not about how data is collected/published externally. I haven't touched the existing pubsub code, and it's likely this API code will be totally orthogonal to it.

spacemeshos / SMIPS

SMIP: go-spacemesh API implementation #21