paritytech / json-rpc-interface-spec

29 stars 3 forks source link

explicit guarantees on the `finalized` event of `submitAndWatch` #76

Closed josepot closed 1 year ago

josepot commented 1 year ago

While working on the higher-level abstraction for submitting transactions, I realized that if the node also implements the chainHead functions, I don't need to stress about the finalized event in transaction_unstable_submitAndWatch. I'd rather have one reliable source of information than juggle two that need to sync.

What I really need is to be sure that when the finalized event happens, the block in that event matches the latest one from the bestChainBlockIncluded event. So, I'd like the spec to be clear about this. It just makes things simpler and more foolproof.

cc: @tomaka

EDIT: I'd also like to have the guarantee that if the node implements the chainHead functions, the bestChainBlockIncluded event from submitAndWatch will come before the chainHead finalized event which includes that block in its list of finalizedBlockHashes. (I just added a second commit with a note to be explicit about this guarantee).

josepot commented 1 year ago

I mean, if worse comes to worse and it's not possible to ensure this guarantee on the spec itself, then I will ensure that this guarantee is met at the low-level library that I've implemented for interacting with this JSON-RPC API. However, that to me feels a bit like a hack.

lexnv commented 1 year ago

IIUC, high-level APIs might need to synchronize the events produced by:

For example, the chainHead might produce events (in this case the finalized) event, before the transaction might produce the bestBlockIncluded. In this case, the transaction class is lagging behind the chainHead. I might not have the context wrt capi, would this be an issue because capi exposes a single stream of information back to the user? 🤔

I realized that if the node also implements the chainHead functions, I don't need to stress about the finalized event in transaction_unstable_submitAndWatch

Yep that makes sense. If both are present, capi is inspecting the extrinsics of the chainHead::finalized event and is able to determine if the tx is finalized sooner.

I always viewed the RPC-V2 classes of functionality as separate entities. This connects the two classes if they are both implemented making the implementation a bit complicated. I'm not entirely sure how we would intertwine the two APIs without complicating the code, since there are multiple variables that might lead to one API lagging a bit behind.

One possible way to meet this may be to introduce a chainHead_tx method, would love to hear your use case and thoughts 🙏

tomaka commented 1 year ago

transaction_watch does three things: validate the transaction, gossip it over the peer-to-peer network (so that it might potentially be included in the chain), then watch the transactions included in the body of each block in order to try to determine the status of that transaction.

The third step (watching blocks) could also be done manually using the chainHead functions. It's simply about downloading the body of each block then checking whether the body includes the transaction in question. There's no trickery or anything: you just have to compare each element of the body with the bytes that were sent to transactions_watch.

In order to implement that third step, smoldot takes some opinionated decision: if smoldot find the transaction in the body of block A, but can't download the body of block A's parent, it does not immediately report the transaction as being in block A. Instead, it waits until it is able to determine that the transaction is not in block A's parent before reporting that the transaction is in block A. If instead it finds the transaction in block A's parent, then it reports that the transaction is in block A's parent. The overwhelming majority of transactions can only be inserted once in a chain, and it often doesn't really matter exactly where it is included, but smoldot is low-level and doesn't actually know that, and thus takes this opinionated decision.

However, this entire new API is designed around the idea that the client shouldn't take opinionated decisions when the JSON-RPC client is able to know better, and in this situation the JSON-RPC client is indeed able to know better. So my take on this problem would be: it should be the job of the JSON-RPC client to watch whether the transaction has been included.

On the other hand, we kinda want the JSON-RPC API to be easy to use from CLI tools, so I'm not sure that it's a good idea to completely drop the "watching blocks" part of transactions_watch, and I think that the solution could be a new JSON-RPC function that only validates and gossips but doesn't watch. However, the JSON-RPC server-side implementation still needs to be able to know when to stop gossiping transactions. This is normally once the transaction has been included, but if the JSON-RPC server doesn't know when a transaction is included, it can't do that automatically.

josepot commented 1 year ago

The third step (watching blocks) could also be done manually using the chainHead functions. It's simply about downloading the body of each block then checking whether the body includes the transaction in question

Yeah, I know. That's why I said that :

I mean, if worse comes to worse and it's not possible to ensure this guarantee on the spec itself, then I will ensure that this guarantee is met at the low-level library that I've implemented for interacting with this JSON-RPC API.

My concern is the redundancy. If the spec provided clear assurances about the timing and concurrency of these messages, there would be no need for me to replicate this effort.

The overwhelming majority of transactions can only be inserted once in a chain, and it often doesn't really matter exactly where it is included

Most transactions can only be inserted once in a chain, but the exact location of their inclusion can be vital. Here's why:

For many dApp developers, when a new finalized block is known, the aim is to immediately understand all potential state changes associated with that block. This includes all events, storage calls, and any pending transactions contained within, alongside their errors and/or events. The CAPI-client intends to make this process easier by providing an API for which if the developer receives a message about a block, it should mean all relevant aspects of that block have been resolved.

Different UIs might use this information in different ways. Some may hold off on showing users the new finalized block until all possible state changes have been computed. Others might show the block immediately but label certain information as "pending" until all state updates are confirmed.

The user experience suffers when the UI rapidly updates and changes without the user having had time to digest those changes. To address this, I'll be refining the work on my low-level library tailored for the JSON-RPC API. However, I still believe it would be ideal if these guarantees were part of the spec itself.

jsdw commented 1 year ago

EDIT: I'd also like to have the guarantee that if the node implements the chainHead functions, the bestChainBlockIncluded event from submitAndWatch will come before the chainHead finalized event which includes that block in its list of finalizedBlockHashes. (I just added a second commit with a note to be explicit about this guarantee).

Offhand, I think I'd want the opposite guarantee. If I'm subscribed to blocks via chainHead_follow, and then I submitAndWatch some transaction, then in my ideal world the finalized/inBlock etc notifications emitted from submitAndWatch would always come after I've already heard about that block hash from chainHead_follow.

The reason being that chainHead_follow is what determines which blocks are "pinned", ie which blocks I'm able to request details for, and so if I see some block hash mentioned in submitAndWatch it'd be nice to know that I can actually request details for that block without hitting any "block isn't pinned" errors.

So my take on this problem would be: it should be the job of the JSON-RPC client to watch whether the transaction has been included.

I guess the advantage of submitAndWatch doing it "server-side" is that, at least when talking to full nodes, the client doesn't have to download all of the block bodies to find out when a tx is included. Of course, in the "light client first" world, the block bodies will all be downloaded by Smoldot anyway by the sounds of it.

The overwhelming majority of transactions can only be inserted once in a chain, and it often doesn't really matter exactly where it is included

Most transactions can only be inserted once in a chain, but the exact location of their inclusion can be vital

I guess the point here is that in rare cases it's possible to have the exact same transaction included in more than one block, and Smoldot makes sure to return the first block that such a transaction was included in at the expense of maybe being a little slower to guarantee this.

In such cases, there is no "exact location" of the transaction, right? It was simply the case that I submitted a transaction, and then later that same transaction appeared in eg two separate blocks. My action could have led to either being true and I don't know. (Perhaps I'm totally wrong here but that's my understanding so far!)

tomaka commented 1 year ago

What I really need is to be sure that when the finalized event happens, the block in that event matches the latest one from the bestChainBlockIncluded event.

To answer this specific point: the way it's designed right now is that finalized implicitly means bestChainBlockIncluded.

Either we keep it this way, or we remove the fields of finalized. The in-between that you suggest IMO doesn't really make sense, as you basically make it mandatory for the same information to always be sent twice from the server to the client.

josepot commented 1 year ago

Either we keep it this way, or we remove the fields of finalized.

I'd like to propose going one step further and eliminating the events bestChainBlockIncluded and finalized entirely from the transaction_unstable_submitAndWatch function. In this proposal, broadcasted would stand as the concluding event if the transaction is valid with no errors.

Here's my rationale:

  1. Limited Utility: As of now, only nodes capable of implementing the chainhead functions can generate these two events. These events, in essence, are drawn from chainHead_unstable_follow. Their usefulness to consumers is heavily based on our ability to provide guarantees regarding the timing of their delivery compared to their corresponding chainhead events.

  2. Lack of Agreement on Guarantees: We haven't reached a consensus on the best set of guarantees for the delivery of these events. This lack of clarity might lead to inconsistencies and potential issues in the future.

  3. Undesired Coupling: The current spec implicitly couples two distinct function groups. In the interest of modularity and to prevent unforeseen dependencies, I would rather keeping the two groups of functions decoupled.

So, yeah... After reading all your comments I've come to realize that it would be best not to have these 2 events being produced from transaction_unstable_submitAndWatch.

tomaka commented 1 year ago

In this proposal, broadcasted would stand as the concluding event if the transaction is valid with no errors.

As I've explained above, this is not possible. The JSON-RPC server must continue broadcasting all transactions periodically until they have been included in the finalized chain. Broadcasting has no actual guarantee of delivery, a bit like a UDP message.

Removing the tracking of blocks is possible only if it becomes the responsibility of JSON-RPC client to inform the server when the transaction has been finalized.

josepot commented 1 year ago

EDIT: I'd also like to have the guarantee that if the node implements the chainHead functions, the bestChainBlockIncluded event from submitAndWatch will come before the chainHead finalized event which includes that block in its list of finalizedBlockHashes. (I just added a second commit with a note to be explicit about this guarantee).

Offhand, I think I'd want the opposite guarantee. If I'm subscribed to blocks via chainHead_follow, and then I submitAndWatch some transaction, then in my ideal world the finalized/inBlock etc notifications emitted from submitAndWatch would always come after I've already heard about that block hash from chainHead_follow.

The reason being that chainHead_follow is what determines which blocks are "pinned", ie which blocks I'm able to request details for, and so if I see some block hash mentioned in submitAndWatch it'd be nice to know that I can actually request details for that block without hitting any "block isn't pinned" errors.

It's the other way around, actually. Precisely because I want to unpin ASAP, I want to have this guarantee. The capi-client intends to provide an API which will allow it to automatically unpin blocks, because we will have the guarantee that the user has already requested all the stuff that they need from that block.

So, imagine a situation in which I know that all the storage queries that the user wants to make for that finalized block have been made, and that the client has also extracted all the other information that it's relevant for that block and that the client has already provided that information to the consumer... So, logically now I want to unpin the block b/c it looks like I won't be needing it anymore, right?

Well, wrong, b/c without the guarantee that I'm requesting I can't do that because maybe an event comes later saying: "hey, you know what, that transaction that was pending... guess what? it got finalized in that block that I told you before", but now I have unpinned the block and I can't make further requests to check which events/errors the transaction has produced. On the other hand, if I have the guarantee that the events produced from submitAndWatch come before, then by the time that I receive the finalized event of chainhead I know exactly all the stuff that needs to be done before I can safely unpin the block.

Without these guarantees, those events are useless to me, b/c I won't be able to unpin unless I have requested the body of the block and made sure that none of the pending transactions are present in it... which is exactly what I will end up doing, and the reason why I'm now suggesting to remove those events from transaction_unstable_submitAndWatch, b/c IMO they create more noise than signal.

josepot commented 1 year ago

Removing the tracking of blocks is possible only if it becomes the responsibility of JSON-RPC client to inform the server when the transaction has been finalized.

I'm actually cool with this. I much rather this than what we currently have TBH. So, IIUC the way that it could work is that the JSON-RPC server would keep broadcasting until the client calls the transaction_unstable_unwatch function for that subscription, right? And therefore it's the responsibility of the client to let the server know when to stop broadcasting, yep?

Yes, please, let's do that! 🙏

EDIT: If we end up going down this road: it would probably be a good idea to rename transaction_unstable_unwatch to something like transaction_unstable_done (or stop).

tomaka commented 1 year ago

I'm not super happy with "just removing the two events". As I've also mentioned above, this JSON-RPC API is not just about UIs. Use cases also include for example exchanges that want to submit transactions and have a guarantee that it's been finalized, or people writing utilities targeting their own node.

josepot commented 1 year ago

I'm not super happy with "just removing the two events". As I've also mentioned above, this JSON-RPC API is not just about UIs. Use cases also include for example exchanges that want to submit transactions and have a guarantee that it's been finalized, or people writing utilities targeting their own node.

I get your concerns about not wanting to simply yank out those two events. But, referencing the Objectives section of the spec, it’s clear: The JSON-RPC interface is primarily designed for intermediary library developers. Its complexity is intentionally traded off for making functions more explicit and predictable:

End-user-facing applications would normally not directly use the JSON-RPC interface, but an intermediary layer library built on top of the JSON-RPC interface. The JSON-RPC interface is a bit complicated to use, in a exchange for making functions more explicit and predictable.

This is why I believe the spec should cater more to those building intuitive layers on top. Take our CAPI example: We first built a foundational client library. This isn’t just for UI stuff but can serve other non-UI projects effectively (like all the ones that you have mentioned, actually). I'm really hoping to see more of such foundational libraries, in various languages, sprouting up and enriching the ecosystem.

I mean, let's be real – managing a transaction broadcast isn’t any trickier than efficiently handling the unpinning of chainhead blocks, right? Meaning that these "foundational libraries" could easily handle that behind the scenes.

Bottom line: I believe those events might add more noise than signal, and they seem to implicitly couple two distinct function groups. Since the libraries built on top will likely deal with these things anyway, why not place that responsibility squarely in their court? This shift would also allow libraries to address the needs of the non-UI scenarios you've highlighted.

But hey, I already made my points, so if you choose to keep those events, I'll just pretend they aren't there 🙈. However, some clarity in the spec about the events’ timing, especially in relation to the chainhead finalized event, would be super helpful. It'd be great to have it explicitly mentioned that if someone's eager to unpin ASAP, relying solely on these events might not be the wisest thing to do.

jsdw commented 1 year ago

It's the other way around, actually. Precisely because I want to unpin ASAP, I want to have this guarantee. The capi-client intends to provide an API which will allow it to automatically unpin blocks, because we will have the guarantee that the user has already requested all the stuff that they need from that block.

Personally that's the opposite of what I'd want in Subxt though; I'd like to know that if I see a block hash in a response, that I can actually ask for details about that block.

In Subxt, we let a user submit and watch a transaction, and when it's finalized, we take that block hash and use it to get the events to know whether the tx was successful or not (in the highest level API) and report that back to the user.

If you get the "finalized" event first, the block may not be pinned yet per chainHead pinning stuff, and so you'd have to wait until you see the block to be able to query for the events for the extrinsic you've already been told was finalized, which to me feels less than ideal.

Perhaps we just have different notions of what the high level APis will look like, but in any case, I like the idea that if you're told about a block hash from the backend, that block is pinned (unless you've explicitly unpinned it already). I think that feels consistent.

As I've explained above, this is not possible. The JSON-RPC server must continue broadcasting all transactions periodically until they have been included in the finalized chain.

Regardless of guarantees, since the node has to wait for the tx to be finalized anyway (or otherwise error out), it feels a bit pointless to not report those events to the user. Even if they aren't useful in one case, they may as well be provided, and if the client doesn't care then it can ignore them.

And anyway, the client needs to know when it can stop caring about some "submitAndWatch" subscription, so the most you could do was to just remove the hashes from those events, right? Which seems a bit pointless to me (they can easily be ignored if not useful, and provide useful information in at least some cases otherwise)

tomaka commented 1 year ago

But, referencing the Objectives section of the spec, it’s clear: The JSON-RPC interface is primarily designed for intermediary library developers. Its complexity is intentionally traded off for making functions more explicit and predictable:

This applies specifically for the End-user-facing applications section of the page. The justification is that the JSON-RPC API is too difficult to use directly for them anyway, so there's no point in providing some functions that are easy to use, as there will always be other functions that aren't easy to use.

For the Node operators and Debugging developers use cases, though, they're expected to use the API directly. Or rather, we don't want to force them to have to go through an intermediary layer if they don't want, as that might a lot of friction.

However, some clarity in the spec about the events’ timing, especially in relation to the chainhead finalized event, would be super helpful. It'd be great to have it explicitly mentioned that if someone's eager to unpin ASAP, relying solely on these events might not be the wisest thing to do.

It is actually already mentioned under bestChainBlockIncluded:

Note: Please note that these is no guarantee that the mentioned block matches any of the blocks returned by chainHead_unstable_follow.

There's not even a guarantee that the blocks are the same as the one yielded by chainHead_follow at all.

It's not just a hypothetical thing. It can realistically be the case, if you use a load balancer and some of the load balanced nodes are a bit lagging behind, that the blocks produced by transactions_watch and the ones produced by chainhead_follow don't match at all.

jsdw commented 1 year ago

Regardless of any guarantees (and I understand now that they may not even be possible to provide, and anyway I'd want the opposite guarantee from Josep :D), I basically think that the events should stay, as is, because it's useful for the client to know when to stop listening for "submitAndWatch" events anyway (ie when it receives "finalized"), and then why not also tell it the block hash since that information is there?

josepot commented 1 year ago

Personally that's the opposite of what I'd want in Subxt though; I'd like to know that if I see a block hash in a response, that I can actually ask for details about that block.

I mean, sure... but then how do you know when to unpin from a block? Because I'm sure that on Subxt you must also plan on unpinning ASAP, right?

In Subxt, we let a user submit and watch a transaction, and when it's finalized, we take that block hash and use it to get the events to know whether the tx was successful or not (in the highest level API) and report that back to the user.

So we do in CAPI, of course...

If you get the "finalized" event first, the block may not be pinned yet per chainHead pinning stuff

Not yet, but you are guaranteed that the finalized event for that block is "about to arrive", so by the time that it arrives you can make the relevant requests, and then you can safely unpin the block if there is nothing else that needs to be done, right? You could interpret that as a "pre-pin" 🙂.

Anyways, with the current spec you can not rely on one behaviour or the other b/c the events could come before or after, so, it's not like those events are going to be any useful to you either in that regard, right?

and so you'd have to wait until you see the block to be able to query for the events for the extrinsic you've already been told was finalized, which to me feels less than ideal.

less ideal than not knowing when to unpin? I disagree, sorry.

Perhaps we just have different notions of what the high level APis will look like, but in any case, I like the idea that if you're told about a block hash from the backend, that block is pinned (unless you've explicitly unpinned it already). I think that feels consisten

I don't think so, no. I think that it boils down to the fact that I want to make sure that I don't stay pinned for longer than what's necessary, because well, if I don't have a good logic for unpinning then things will blow up, so yeah I care quite a bit about having the required guarantees so that I can safely unpin whenever I no longer need to use a block.

You still have not explained what your solution for unpinning effectively is with the guarantee that you prefer, or with the lack of guarantees that we currently have.

josepot commented 1 year ago

because it's useful for the client to know when to stop listening for "submitAndWatch" events anyway (ie when it receives "finalized"

I will know when I should stop listening for "submitAndWatch" irregardless of those events, because I will do so when a transaction that has been broadcasted and that it hasn't errored appears in the body of one of the blocks reported by the finalized event of chainhead_unstable_follow function...

jsdw commented 1 year ago

You still have not explained what your solution for unpinning effectively is with the guarantee that you prefer, or with the lack of guarantees that we currently have.

Mmm, I have two competing ideas and need to decide on which approach I take. Roughly:

Idea 1:

Idea 2:

I can see that with idea 1, a guarantee after would be simpler for me, and with idea 2, a guarantee before might be simpler!

I'll probably be exploring idea 2 ultimately but may well start with 1, since it's a subset of 2 anyway (ie both will auto-unpin blocks that are reaching the end of their lifetime, and then 2 adds smarter unpinning on top). I'll start working on some of these things soon and will see what happens :)

I will know when I should stop listening for "submitAndWatch" irregardless of those events, because I will do so when a transaction that has been broadcasted and that it hasn't errored appears in the body of one of the blocks reported by the finalized event of chainhead_unstable_follow function...

Here we are in a different position, because your implementation will be light client only, and since Smoldot has downloaded the block bodies already (afaiu), there's no overhead with your API getting them all and doing that check manually.

Subxt can connect to RPC nodes too, and in that case, having to download block bodies (which could be quite large) is much more expensive if it can be avoided by watching for the appropriate events and hashes instead and just following block headers otherwise.

josepot commented 1 year ago

Here we are in a different position, because your implementation will be light client only

I'd like to address what seems to be a common misunderstanding regarding CAPI. We are not developing a "light client only" solution. Instead, our focus is on a "light client first" approach, and there's a significant difference between the two.

The "light client first" philosophy means that our library is primarily built with the light client in mind. However, it isn't limited to just that. Our only expectation from the JSON-RPC provider is the proper implementation of the chainhead group of functions. Whether the server behind the provider is a light-client, a substrate RPC node or anything else is secondary to us. As long as the chainhead group of functions are implemented correctly according to this spec, then CAPI should work with it. Other functionalities (even broadcasting transactions) are considered extras.

This is important for a number of reasons. One of them is that it enables a smoother migration for dApps, especially those using PolkadotJs. By allowing them to use both PJS (which will use the legacy JSON-RPC API) and CAPI (which will use the new JSON-RPC API) concurrently (while talking to the same server), the transition becomes possible. Once the migration is complete, dApps can then shift to a light-client provider exclusively.

I had been under the impression that both CAPI and Subxt were aligned in this "light client first" direction. However, your latest comments suggest otherwise. While this is surprising and, in my opinion, perhaps a missed opportunity, I respect and understand that different projects may have their unique reasons and trajectories.

tomaka commented 1 year ago

having to download block bodies (which could be quite large) is much more expensive if it can be avoided by watching for the appropriate events and hashes instead and just following block headers otherwise

Please do not worry about non-existing performance problems. We can solve performance problems later, for example by adding a JSON-RPC function that asks the node whether a transaction is in a body. The main objective right now is correctness and ease of use, performance in general really doesn't matter to me at this point.

Block bodies can in theory be quite large. In practice they aren't. Block bodies need to be transferred between the validators of the blockchain within a 2 seconds interval, so it's not like they can grow to more than a few Megabytes at most.

jsdw commented 1 year ago

I had been under the impression that both CAPI and Subxt were aligned in this "light client first" direction. However, your latest comments suggest otherwise. While this is surprising and, in my opinion, perhaps a missed opportunity, I respect and understand that different projects may have their unique reasons and trajectories.

Subxt will eventually be light client first, but it's been around and used for a while and so is still using the old APIs (and for this reason also can't afford to "only" support the new APIs until they are stable, though it will have unstable support for them soonish). I also still think that there will always be valid use cases for being RPC-node friendly, such as if you run your own full node and want to talk directly to that or what have you, but eventually the light client interface (which we already support now, albeit via the old APIs and marked "unstable") will be the main one that we push :)

Please do not worry about non-existing performance problems. We can solve performance problems later, for example by adding a JSON-RPC function that asks the node whether a transaction is in a body.

Sure, but I'll always tend towards using whatever APIs are made available in a way that is more performance friendly, and watching for a hash in a small event is more friendly on the Subxt side than downloading block bodies and checking.

But, that said, I can see that downloading block bodies has a certain elegance to it though, especially in the light of the transaction call not offering any guarantees about the hash it reports, so I'll probably explore that approach too. It would also enable me to unpin blocks sooner because I don't have to keep anything around to match up to the hash that comes backm which is pretty handy!

josepot commented 1 year ago

As I explained in this comment. I reached the conclusion that separating transaction functions from chainhead was a mistake. So, I'm closing this PR because I will be creating a separate PR for integrating transaction functions into chainhead.