Closed theoreticalbts closed 6 years ago
Let's talk about the state transition model. In general, a transaction can be in one of the following states:
Each of these requires separate analysis.
So basically a transaction's state cannot be considered finalized until its inclusion, expiration, or TaPoS invalidation has occurred and has passed the last irreversible block.
The reason I'm talking about this is because we need to precisely define, in terms of the actual transaction lifecycle through the above states S1-S6, exactly (1) what semantics the current API provides, and (2) what semantics API consumers need / expect for particular applications.
The first question -- what semantics the current API provides -- is by far the simpler question to answer. The current API simply provides an error when a transaction is immediately rejected (S1), success when a transaction is initially accepted (S2), and a callback (server-push) notification the first time a transaction enters (S4) -- i.e. is included in a block -- from any other state.
The current API does not provide any notification of any other state transition. Clients have no way of being notified of any other state transitions -- there is no way to process forking [1], expiration [2], and non-immediate rejection/acceptance.
[1] Forking can in principle be handled by delayed node, but I'm not sure if any client actually does this.
[2] Expiration can in principle be handled by client-side logic, but I'm not sure if any client actually does this. In particular, it seems a bit much to ask clients to handle the tricky issue of a fork potentially un-expiring expired transactions.
The simplest possible API which we can provide is to simply allow a client to issue a polling query like this:
check_transaction_state( txid, ref_block_num, ref_block_prefix, expiration )
This call will return a JSON data structure including:
is_irreversible
flag which, if true, means the transaction can never enter a different stateWhenever a client submits a transaction, the client would then make the polling call every 3s or so until the tx enters some state irreversibly. The downside is that having a large number of polling calls for every transaction greatly increases the call load on the server for steemit.com.
Since the polling call's implementation wouldn't require full blockchain state, but rather only the txid's of recent blocks and the last irreversible block number, perhaps the best option is an out-of-process implementation which can easily be scaled up. An out-of-process implementation might also be able to provide a more event-driven API than pure polling, such as long polling or websocket notification.
Thoughts?
I had a conversation with @bytemaster about this, and he had some good insights, which I'll attempt to summarize.
A better breakdown of states would be: Initially rejected (R0), tentatively rejected (TR), tentatively confirmed (TC), irreversibly rejected (IR), irreversibly confirmed (IC).
A transaction which is IR or IC cannot change state, a transaction which is TR or TC can change state, a transaction which is R0 cannot change state for practical purposes [1].
[1] It could change state if the originator submitted it to a different node in the past, or will submit it to a different node in the future. Basically you treat R0 as IR if you trust the originator not to do multiple submissions [2], and TR if you don't.
[2] Clients can trust the originator to have this property for transactions they originated, as long as they connect to a trusted server and throw away tx's they submit which get an R0 response.
I haven't really written any code for this, releasing my assignment.
Implementing this may help SMT creators use TaPoS to ensure SMT creation is atomic #1795, and may also help steemit.com and third-party apps create an improved UI for tracking and dealing with expired transactions.
For practical purposes, there needs to be a 6th transaction state Unknown (U) that represents a transaction that is not initially rejected, tentatively rejected, nor irreversibly rejected, and has not yet been included in a block. This state represents the case when the transaction may need to be rebroadcast.
The API call should accept the transaction and return an enum of these states.
This call should live in network_broadcast_api
.
there needs to be a 6th transaction state Unknown (U) that represents a transaction that is not initially rejected, tentatively rejected, nor irreversibly rejected
This depends on how the API is designed. I originally envisioned the transaction query states to be return values from the broadcast API. This means you have semantics of "If transaction is state U, broadcast if you don't know about it" which means the RPC caller cannot observe state U, the transaction will have moved to accepted/failed local push. If you want to allow a separate query-only operation which says "what is the status of transaction <txid>
?", that's fine, but you also need the caller to provide TaPoS and expiration if you want to be able to tell the client that it's expired or failed TaPoS.
Ok, I see. You were assuming that this would be an additional return type to broadcast_transaction
. Currently, the return type is void, so it would creating a new return type.
I envisioned creating a new API method so that the semantics of the original method remain unchanged. They would change because a failure to push the transaction results in an exception being thrown, which is returned as a jsonrpc error message. Instead, we would return the state. I am fine with your approach so long as the details of an error are still returned. Knowing why an op is rejected is critically important to the use of this API.
I propose the following return struct.
struct broadcast_transaction_return
{
transaction_state state;
optional< fc::exception > error;
};
To support this API, we need to record locally when each block is included (to determine irreversibility). Currently, we only track the set of unexpired included transactions. That needs to be expanded to the entire 64k range of TaPoS blocks. With the existing transaction_object
definition, this is quite expensive due to the inclusion of the packed transaction. However, the packed_trx
field is not needed. It is used in database::get_recent_transaction
which is called by the p2p code. However, if the transaction_object
includes the block in which the transaction is included in, and perhaps the transaction num in the block, we can look it up via get_block
. This is an addition of 8 bytes to the object, which is smaller than our smallest transaction.
This should increase reindex times by a decent amount because pracked_trx
is transitory state and is dynamically allocated. I imagine these savings more than make up for the added costs (both memory and computation) of keeping an index of all transaction IDs for the last 64k blocks and included included block number in each entry.
My recommendation for the transaction_object
class transaction_object : public object< transaction_object_type, transaction_object >
{
public:
template< typename Constructor, typename Allocator >
transaction_object( Constructor&& c, allocator< Allocator > a )
{
c( *this );
}
transaction_object() {}
id_type id;
transaction_id_type trx_id;
uint32_t block;
uint32_t trx_in_block;
};
The refactor of the transaction_object
and use in the p2p code should be done in another issue that blocks this issue.
So I thought some more about this, and there are some subtle problems we have to consider. The only things the state database can tell you are:
In particular, given an architecture of clients making stateless calls to random nodes, the only way to distinguish between failed transactions and transactions that we don't know about is during the initial broadcast of a transaction. There is no way to detect a transaction that becomes invalidated due to a block (i.e. the example where transfer Alice -> Bob becomes invalid when transfer Alice -> Charlie is included in a block).
So the only information we can give to clients is:
[1] The state database doesn't track transactions outside the TaPoS window. We may want to have logic in Jussi to query Steemd for blocks in the TaPoS window, and also query SBDS for historical blocks.
[2] I think this ticket may predate our decision that all future API's should be stateless.
Currently, we only track the set of unexpired included transactions. That needs to be expanded to the entire 64k range of TaPoS blocks.
Let's not change the dupe checking code. If the dupe checking tx cache isn't sufficient for the functionality we want to implement, let's put the additional needed objects in a plugin.
Let's not change the dupe checking code. If the dupe checking tx cache isn't sufficient for the functionality we want to implement, let's put the additional needed objects in a plugin.
This does not change the fact that the current implementation can be modified to reduce reindex time and state size.
I think this ticket may predate our decision that all future API's should be stateless.
If we do not perform writes, then it is stateless. My original idea for the API was to simply query the current state without actually pushing the transaction. Perhaps it is best to go back to the idea of separate push and query calls.
Let's consider how you're going to use this information if you're a client.
I picture the client will implement:
The client implements a transaction broadcast, followed by a query poll.
The problem here is that the client must set TaPoS so that the old/new transactions cannot both be included. If we want to support multiple re-submissions, we have to consider the possibility that there are multiple old transactions, and the new transaction must be newer than any of them.
Which means the new transaction's TaPoS must refer to a block greater than the expiration of any of the old transactions on a chain which does not include any of the old transactions. So in order for the client to provide this kind of atomicity, it must atomically query a list of transactions. The query can return a report that each transaction in the list is (or is not) included, and get a report of the block ID and block number of some block with a timestamp greater than the maximum expiration among the list.
So how does the resubmission actually work?
block_id_type()
(all zeros), then some old transactions are still not expired (according to the head block time of the node that answered the query). So the client declines to sign.So now let's discuss how the query needs to compute the returned block number / ID. The only requirement is that the returned block ID is newer than the expiration of every transaction. (If the expiration of some transaction is still in the future according to head block time, then you return a block number of all zeros.) It suffices to return head block time, but this is subject to forks. I suggest the algorithm returns whichever is later: LIB or the first block newer than the expiration of every transaction.
As discussed above, we probably want to implement a combining API which combines the result of a historical query (to SBDS, say, which indexes irreversible blocks) with the result of a current query (to a steemd node, which only indexes very recent blocks). The combining API implementation needs to know that:
So to implement this, what do we need?
The combining query fills the steemd query field from the head block of the historical query. Note that because of atomicity concerns, we cannot implement these as separate requests! Here's one possible failure mode:
This failure mode can be bypassed if the history node returns its current block number as part of the API call, instead of a separate call. Similar reasoning leads us to conclude steemd must execute similarly.
The above discussion sheds some light on how many transactions we need to track. We don't need to handle the entire 64k TaPoS window, we only need to handle LIB plus some threshold which is almost always greater than the maximum lag of the historical node. I suggest maybe double irreversibility would make a good threshold: When a transaction was irreversible in LIB, you can forget about it for the purposes of this query [1].
How would this be implemented?
[1] The set of queryable transactions is not a subset of the transactions tracked for duplicate purposes.
Since irreversibility need not advance (for example one witness producing on its own chain and all the other witness "missing"), it is possible for transactions to fall out of the TaPoS window without becoming unqueryable.
I agree. A store that tracks double irreversibility combined with the irreversibility store of account history provides complete transaction history for the purposes of this API.
The question we still need to answer is where should this live?
We are trying to keep things out of steemd that don't need to be in steemd, but I see value in having basic functionality with just steemd and a wallet. If we move account history to SBDS and require querying SBDS to confirm a transaction then the exchange either needs to query api.steemit.com or run their our instance of SBDS.
I'm not averse to adding an option to index transactions by ID since the beginning of time. If you run an application that either doesn't care about older transactions, or uses an external solution for indexing them (e.g. SBDS), you can turn off the option and have smaller state size by limiting yourself to non-doubly-irreversible transactions.
An alternative approach is to have a Docker container with pre-configured steemd
and SBDS. And/or maybe a tutorial on how to manually set up SBDS, for those who won't / can't use Docker.
There seems to be a desire for indexing transactions by ID. https://github.com/steemit/steem/issues/2309
We are going save transactions locally for double irreversibility plus 2400 (2x max expiration of blocks).
This should provide an ample buffer when the key query for irreversible inclusion/exclusion can be answered.
The purpose of this data store is to answer this query for current transactions, not for historical transactions. get_transaction
by an ID for historical transactions should be handled by a more appropriate data store such as SBDS.
Is the last comment implying #2309 will not be fixed? Could the buffer be configurable so that no purging happens for nodes which need it?
This issue isn't excluding #2309, but I would like to know a compelling reason for get_transaction
to live in steemd as opposed to another service?
Closing, development issue #2458 opened.
The crash bug in #355 is due to the confirmation notification code. We currently notify clients when a transaction is confirmed. There are several related issues to consider: