stellar / soroban-rpc

RPC server for Soroban contracts.
17 stars 18 forks source link

add the ability for a user to iterate through transactions #61

Closed mollykarcher closed 7 months ago

mollykarcher commented 8 months ago

What problem does your feature solve?

Horizon will cease to serve txmeta sometime in Q2 (see discord discussion) due to data storage size concerns of Soroban transactions. The ecosystem needs an alternative to this, and we want that alternative to be the RPC.

What would you like to see?

Allow a user of the RPC to iterate through transactions within it's transaction window, and those transactions should serve all appropriate txmeta related to that transaction.

What alternatives are there?

mollykarcher commented 8 months ago

@janewang do you have an opinion on the "how" here? That is, websocket/pubsub vs polling/pagination

janewang commented 8 months ago

I suggest streaming pub/sub as Horizon currently implements it.

sreuland commented 8 months ago

I suggest streaming pub/sub as Horizon currently implements it.

Hello @janewang , @jake commented on same aspect of streaming as an RPC futurescope feature in Stellar RPC Product Specification , I posted a follow on reply with more potential on that, can paraphrase here also:

streaming protocols are great for clients in distributed systems that need to handle 'infinite' data and promotes event driven architecture with decoupled micro-services which can derive their own custom data models from the root source-of-truth stream.

horizon's streaming is based on HTTP SSE, server-sent-events, its one-way, single client request to a single ongoing server push over the same http connection, this could fit for RPC streaming.

SSE has some limitations on streaming ability, it's one-way, and somewhat lossy since any connection disruption will result in client being disconnected and having to restart a new http connection at which point may have missed messages in the interim unless the endpoint has more logic for cursors and client code maintains that cursor.

A level up from SSE is websockets, which works in browser js, important as RPC needs to support browser clients. It allows two-way messaging, but not suggesting to use that directly in RPC, rather an option to consider in general for providing more robust streaming from RPC to clients, is to offload the websocket responsibility to an external mq broker instead which can implement its protocol over websockets, such as MQTT, then RPC doesn't need to provide any streaming infrastructure, instead it could just be configured as an mq client and pub/sub to the broker such as MQTT, and client apps pub/sub with the broker using that client sdk, like MQTT.js, this way the client system is responsible for provisioning the mq broker instance, and the broker manages websocket loads. Clients can leverage some nice MQTT protocol features like subscription topic filtering, to build streams that filter down to only specific messages like maybe by contract id. Rpc getEvents already has the notion of topic filters in place and they were kinda modeled after the MQTT pattern. Clients can leverage the mq protocol for messaging QoS and persistent sessions to achieve advanced messaging delivery semantics, such as exactly-once delivery for clients, which is much more robust than polling, SSE, or websockets.

janewang commented 8 months ago

As suggested in the spec, websocket is where we are planning to go eventually. We do need tx metas changes to be out by protocol 21 as there are quite a bit dependencies from the ecosystem on this and Horizon will turn off the tx meta flag in protocol 21. If there are not no objections to the timeline, I'm supportive of a websocket implementation.

tomerweller commented 8 months ago

Should we rename this issue to add the ability for a user to iterate through transactions? Iterating through txs (not only meta) is an important feature and I assume we'll be using the existing transaction resource (which includes metas). Is there a need to separate?

On API design, I think that we should definitely have a regular json-rpc method (getTransactions?) to iterate over transactions (I assume it's shape will be similar to whatever we currently use for events).

Do we really need a streaming API on top of that? Updates happen only every ~5s after ledger closes, you can potentially just poll the above method.

Turning on websockets should theoretically be easy as JSON-RPC is transport agnostic (and seems like our library has a relevant ws module). Though not sure what the lift is of adding streaming on top of that.

Shaptic commented 8 months ago

I'm not committed to this opinion but I'll offer it up in contrast to facilitate discussion:

Are we sure we want to commit to websockets when we don't have a compelling reason to have bidirectional communication? They are more demanding relative to SSEs and come with their own add'l complexities: e.g. defining a schema on top of it and the lossy/reconnect issue still being there. It also opens a permanent connection and I'm not sure how that works with load balancing because they're stateful. They also will be mostly idle and only do something every ~5s (ledger close). So it feels like overkill relative to SSEs.

I do like the idea of only providing streaming data in the form of an MQ: as you said it's very robust and keeps things simpler on our end as well. However, I'm worried it might be too demanding for downstreams to get started quickly? But honestly there's libraries for everything these days (like @sreuland pointed out with mqtt.js!) But it would be annoying to maintain multiple ways to stream stuff so going for robustness could be the right answer.

sreuland commented 8 months ago

yes, SSE is lighter weight and proven in action with horizon. For plain websockets with no other protocol on top, it's subject to the same potential lossy-ness due to connection interruption as SSE, and to @Shaptic's point, RPC requirements don't call for two-way communication, server sends requests to clients. Websockets probably only makes sense if we think it's worthwhile to make the extra leap to with an MQ protocol on top of websockets like MQTT to gain reliable message delivery on streaming events, tx-meta, transactions

sse/websockets can both be challenging to deploy under higher client volumes, since each client holds a long running http connection on the server(and through any load balancers), usually entails non-trivial scaling architecture on the back-end servers. The idea with isolating the streaming under an external MQ is the broker deployment deals with this scaling and provides its reliable messaging protocol on top. It doesn't conflate streaming requirements into rpc server deployment. If rpc hosts a streaming service then we will likely need to include more advanced RPC deployment requirements/contingencies based on expected number of clients, not related to pure rpc runtime requirements.

leighmcculloch commented 8 months ago

🤔 A meta streaming API for RPC is somewhat at odds with the goals RPC has specified in the specification: The RPC has a small window of data access, 7 days only. Applications that stream data, rather than selecting narrow segments of data, probably won't tolerate gaps in data. If those applications won't tolerate gaps in data, and RPC can't fill gaps in streamed data, then integrating it as the way to stream meta could be challenging. I'm sure streaming meta would still be useful, but it may set folks up to fail. How would they fill gaps?

What are the use cases for how streaming meta will be used in RPC?

I think folks building applications that need to stream meta, probably need to depend on something where they can fill gaps. Maybe this is something the RPC could do in the future, efficiently, but it isn't in the goals at the moment.

leighmcculloch commented 8 months ago

The existing getTransaction endpoint already returns meta, which is great, so applications can already extract meta for transactions they've submitted. And many of the folks chatting here seemed to indicate they're getting meta for specific transactions, not streaming meta of all time.

leighmcculloch commented 8 months ago

Allow a user of the RPC to iterate through transactions within it's transaction window

It does seem like getTransactions that @tomerweller suggests above, with maybe with a ledger number as input, would allow a developer to iterate over transactions within a transaction window.

Transactions are announced in this burstable fashion. Maybe the stream shouldn't be oriented around transactions or meta at all, and instead be a stream of ledgers. The goal being to subscribe and be notified or new ledgers. Then if you want to follow transactions, you watch the stream of ledgers to be notified when a new one is released, and when a new ledger is announced, you go and retrieve the transactions for that ledger. Possibly you might request multiple pages of transactions for a single ledger concurrently* rather than serially and disconnecting the stream from the data is helpful in doing that.

* Although the cursor based pagination that the RPC supports today probably doesn't support concurrent page collection.

janewang commented 8 months ago

It does seem like getTransactions that @tomerweller suggests above, with maybe with a ledger number as input, would allow a developer to iterate over transactions within a transaction window.

Yes, agreed and please see description in the specification.

getTransactions: Return all transactions available on the RPC with an optional filter which returns a filtered array of transactions

janewang commented 8 months ago

For implementing websocket in the RPC, it is a potential feature for future development. At present, there is no requirement to implement it now, and I believe more thorough assessment on the RPC websocket methods is required to fully scope this piece of functionality on the product side and an evaluation of engineering design is likely required from eng.

In the immediate term in light of the upcoming sunset of tx metas in Horizon, our primary focus for Q1 is to provide alternative solution and facilitate the transition away from using tx metas in Horizon.

sreuland commented 8 months ago

getTransactions: Return all transactions available on the RPC with an optional filter which returns a filtered array of transactions

once that's added to rpc, we could include mention for js-stellar-sdk to abstract the long polling of cursor based pages of getTransactions into a more standard programmatic streaming interface like subscription class and callback handler.

With the js sdk subscription abstraction, it could also enable ability to update RPC to an actual streaming protocol like SSE if we see the value later and js sdk could update it's abstraction to that format but not introduce breaking change upwards on client since they continue using the subscription/callback interface.

I think folks building applications that need to stream meta, probably need to depend on something where they can fill gaps.

there's a new component LedgerExporter coming up as part of composable data initiative, it's currently under development, it's purpose is to serialize tx-meta based on ledger range(or unbounded) and export to pluggable destinations, could be file stores, MQ broker, etc. Wondering if this may be an alternative further out to consider for recommendation to apps to use in a complementary way with RPC when they want ranges of meta beyond the RPC's window.

janewang commented 7 months ago

For getTransactions, could we add the ability to filter similar to how we filter in getEvents.

Btw, as a quick follow on, I opened a ticket in stellar-docs for documenting this endpoint: https://github.com/stellar/stellar-docs/issues/357

mollykarcher commented 7 months ago

Closing in favor of https://github.com/stellar/soroban-rpc/issues/110