Open vukolic opened 8 years ago
for section 2.1, we should try to use a UTC timestamp of possible: tsUTC.
for section 2.2, 1st alternative design, HASH(txPayload) seems preferable as we should try to avoid clear text wherever we can without losing the clarity of the specification.
Just some clarifications: In Section 1.2.i, shall 'Clients create and thereby invoke transactions.' be '' Clients create and thereby invoke chaincodes'? In Section 2.1, shall 'To invoke a transaction' be 'To invoke a chaincode'?
@gregmisiorek yes, hash would be preferred, especially in the case of confidential transactions. @adecaro at this level, we should use transaction instead of chaincode to be more generic.
A couple of comments / thoughts
@bmandler Many thanks for very insightful comments! Let me respond quickly - but addressing fully requires more work as discussed below: 1) Good point - and this was originally the case in the design, until there was a remark that endorsers may be more general as peers that do not necessarily maintain the state. But in principle, I agree - that we could simplify + endorser will most of the time be actually a committing peer. 2) The idea is that transaction deployment are seen as an invocation of system chaincode, where a certain set of (system chaincode) endorsers may need to endorse that a transaction is deployed. I again agree, the flow of the deploy is todo. 3) There was already a comment re this from @christo4ferris - will be added (todo). 4) In principle, this would be an implementation optimization (there will be many of these). But, as you correctly point out - depending on the trust model in the endorsers of this chaincode - this optimization may be applicable only when a certain number (or set) of endorser reply with STALE_VERSION (cf Byzantine endorsers). I suggest to address this by adding a comment along these lines to the design draft. 5) This is why deployment might need to go to all committing peers not just to endorsers as you suggested in 2) - this will need to be made clear when Deploy flow is outlined. 6) With the confidentiality as enabled by this design, confidentiality would be violated with Byzantine endorsers specified by the chaincode. Notice that the integrity guarantees of such a chaincode execution could be maintained despite confidentiality violatoins. If this level of confidentiality is insufficient, chaincode can/should resort to application (chaincode) level confidentiality techniques (which the design does not prevent). 7) Dealing with submitting peer submitting a tx with wrong endorsement - is handled with endorser signatures. As for interactions with consensus - it is the job of the consensus module to deal with Byz submitting peers (as Byz consensus clients) and Byz committing peers. Byz committing peers are simply learners in Paxos wording so it is easy to tolerate any number of Byz committing peers from consensus perspective. Dealing with Byz committing peers for queries will need to be spelled out - when we spell out queries. 8) This means that a submitting peer, acting as a committing one will be able to deduce an invalid tx and act upon it as agreed w. client (cf retryFlag, etc.)
BTW, if you want to volunteer for writing one of these TODOs (e.g., deployment/queries/anything else) - pls ping me so we coordinate. Thanks again!
Typos:
Hi guys
Thanks for the good work on the Next Consensus Architecture Proposal. I like the general direction of moving away from a monolithic architecture, allowing for more flexibility and separation of concerns. I have a bunch of comments at varying levels of detail. They are listed more or less in order of the relevant parts of the proposal. These comments were written while reading this version. Please excuse the ugly long numbered list but this may facilitate easier referencing for further discussion
This sentence in Section 2.5 is confusing: "For example, once can provide serializability when for all keys in the readset and writeset, the current version in the state is still the same as the one in the readset, and all other transactions are rejected."
How about: "For example, serializability can be provided by requiring the version associated with each key in the read set or writeset to be equal to that key's version in the state, and rejecting transactions that do not satisfy this requirement." ?
Hope some of this is helpful.
@mm-hl Thanks! I am again going to focus in the response only on vital things: 3) to be fixed - probably by making endorsing peers also committing peers 4) ack - to be fixed 5 + 14) Let me tackle this one early on: The approach to handling concurrency is a critical one. Version dependencies were chosen after weeks of (sometimes heated :) discussion with @chetmurthy and @cca88. Although this one probably deserves a separate issue - let me elaborate a bit here. In principle, we have two approaches to handling concurrency at hand. a) version dependencies (verDep): these are typically used in the context of databases where a concurrent change to an object is conditional to object not changing in the meantime. This provides liveness-level typically called in distributed computing - obstruction freedom. In DB parlance we need to specify isolation guarantees, which as you correctly point out - may be specified per chaincode and not globally. b) sequencing through a leader (leader): in this approach - taken by many distributed systems, there would be a leader endorser that sequences concurrent requests and resolves concurrency. This approach is taken by the Sieve protocol that is a current HL fabric prototype consensus protocol (http://arxiv.org/abs/1603.07351). Both approaches have pros and cons. Again, this requires a separate issue but will try to put out the essence here. "verDep" PROS:
@vukolic thanks for the quick and detailed response. Minor responses in this comment, longer discussion in the next.
7) Doesn't seem to address my comment, not sure what you are saying.
8) All of the things mentioned (including PROPOSE and SUBMIT messages) contain signatures, according to my reading of the proposal. My comment is about what is covered by the signature and specifically about why the message type is covered in some cases and not in others. I can see that it's potentially important for the signature to cover the TRANSACTION-[IN]VALID, so nobody can lie about the determination, but I can't see that it's important not to cover PROPOSE and SUBMIT, and all else being equal consistency is probably preferable. No big deal, I was just curious if this was intentional.
9) My comment doesn't count as a vote to keep the roles of consenters separate from committers. It's a suggestion to keep the functionality separate, and (along with comment 11) to generalise the protocol to cleanly cover a wider range of configurations and not have to specify what gets sent to whom and in what order.
15) Yes I think that would be clearer.
20) Sorry, yes I meant "batch" where I said "blob" (first instance in comment 20).
21) Above typo notwithstanding, I stand by my comment that the sentence seems contradictory. (Minor nit, just trying to help improve presentation.)
22 + 26) Not sure what you mean. If it is "required", for whom does this create an obligation? At what granularity might one opt out if it were options? Per implementation? Per Network? Per peer?
23) I emphasize "in such a way that they affect the hashes required to verify valid transactions" in my comment. I agree that making the consensus fabric process opaque blobs means that it has no option but to include invalid transactions in its output. That does not prevent committers [edit: this previously mistakenly said "consenters"] (who can determine that they are invalid) from maintaining a ledger representation that keeps them out of the way, thus facilitating easy pruning and fast sync for new peers, which is what I attempted to illustrate in comments 24-26.
Responding to @vukolic's 5 + 14):
First, I find it difficult to tell from the proposal what the role/purpose of endorsers is intended to be, which I think affects this conversation significantly, because it's hard to figure out what motivates what. Here are some of the reasons I'm struggling with this and some "reading between the lines" that may help me understand and you clarify:
I note that the proposal does not require committing peers to be able to execute transactions. My comments were coming from a perspective where it would be a reasonable for them to be able to do so. Perhaps avoiding this is a key motivation for the proposed design? I can see potential advantages including:
If I reread your response with an assumption that avoiding committers executing transactions is a/the key motivation, it makes a lot more sense to me, so I think I am understanding the proposal and its intentions better now. If this is correct, the proposal would be much improved by upfront discussion of this motivation, probably in Section 1.3. I'll proceed with the assumption that I've got that right.
First, wouldn't it make more sense that the submitter would send the txpayload to endorsers, and the endorsers would execute the chaincode and construct the transaction? That way, the submitter would not need to be able to execute the transaction, thus supporting confidentiality. Presumably endorsement policies would (usually) require that all endorsers produce the same transaction; otherwise, what would make sense to do?
This raises a question: against what state do (submitters and) endorsers run the transaction? Even if chaincodes are deterministic and all participants are honest, they may end up with different state updates and verdeps if they run against different states. Maybe the submitter should nominate a state to run against, and endorsers could endorse only when/if they have caught up to that state?
The key advantage of obstruction-freedom is that it allows us to decouple synchronization from scheulding/contention control, while admitting much simpler mechanims that are required for lock-freedom and wait-freedom. But obstruction freedom without effective contention control is usually a disaster. I vaguely thought about this when considering the retryFlag. Is that its intended purpose? If so, have you thought much about what kinds of retry policies could be implemented and would make sense? Presumably endorsers would be in the best position to implement retry policies, and thinking about this in context of your comments about Eve starts to maybe make some sense. For example, endorsers could potentially work together (perhaps, but not necessarily, via a leader) to combine transactions from the same chaincode into orders or batches that might somehow be processed more efficiently by committers, perhaps somewhat analogous to the "mixer" in Eve. This might be mostly internal to endorsers, but the protocol might help by enabling endorsers to communicate ordering/batching dependencies to submitters (or directly to the consensus fabric).
Coming back to the potential motivation for reducing transactions to state updates and version dependencies pre-consensus, it may help to improve throughput for committers, but:
One more small comment on terminology. I find it slightly strange in Section 1.2, bullet 2.a that a committing peer "commits the transactions and maintains the state" but is called a "read-only peer" (if is not also playing other roles). I get that the intention is that it can only support read-only queries from clients if it does not play these other roles, but nonetheless the terminology seems slightly strange to me given that it is maintaining the state and applying updates to it.
@ mm-hl quickly re 5+14)
The idea of HASH(txPayload) in alternative design in 2.2 is misplaced - HASH(txPayload) (or tid) should be sent to consensus service in 2.4 for confidentiality, so basically tran-proposal cannot be reused from 2.2 to 2.4. PROPOSE message (Sec 2.2) HAS TO have full txPayload sent to endorsers for - indeed - execution of the transaction. I will make this clear asap since this is critical.
[EDIT: this is now fixed by having tran-proposal containing HASH of the payload, while PROPOSE contains txPayload, explicitly and outside tran-proposal.]
As for your remarks on confidentiality - precisely - this is one of the main motivations of this approach. I am preparing an extension of the document that will deal with confidential chaincode examples in this architecture so things should become more clear then.
As for synchronization for obstruction freedom - the expectation here is that the data model will permit less synchronization (e.g., "spend-once" UTXO). If this turns out not to be the case - we will need synchronization (e.g., leader) in the vein of Sieve/Eve - perhaps only for state partitions that hold such objects.
@ mm-hl re 9 + 23)
" That does not prevent consenters (who can determine that they are invalid) "
In fact consenters - in the current proposal - have no way to determine whether a tx is invalid or not. This can be only done by (committing) peers.
The design is currently general as in - it separates consenters from (committing) peers. The main motivation is to allow easily pluggable generic consensus service that do not have to care at all about chaincode state, verifying endorsement policies and the likes. Such a consensus service could inherently be as scalable and performant as possible as execution/validation is totally removed from the critical path.
The cost of this genericity and modularity is the fact that consenters (consensus service) cannot tell valid and invalid transactions apart. This needs to be done at the (committing) peer level. If this is a show-stopper, it is fairly easy to specialize this architecture to have consenter >= committing peer. This would come at the expense of sacrificing advantages listed above.
Besides 5+14 - this is another big design decision that we as HL community need to take.
Including stateUpdate
and (more importantly) verDep
in the PROPOSE
message to endorsers allows endorsers to be (internally) sharded on the chaincode state keys.
@mm-hl: Thanks for your comments and the discussion.
Re 13) Your text says what was meant, this is better.
Re 14) Yes, this would essentially be a transaction manager running together with consensus on blockchain.
Re 26) The proposal mentions this, it's the optional "(Blockchain) State" data structure that can be part of the hash chain, included through the root of a hash tree on the state, just as you suggest in Ethereum. Makes a lot of sense as an optimization, not strictly needed for security. We will need to decide whether to have this.
@vukolic Sorry, I caught myself a number of times saying "consenter" when I meant "committer" but obviously didn't catch them all. Yes, I totally understand and agree, and my sentence was about what consenters store, nothing to do with committers.
@corecode True, but at the expense of confidentiality from the submitter, since it has to execute the transaction in order to determine the stateupdate.
@cca88 The "(Blockchain) State" mentioned in the proposal is described simply as a KV store. I don't see anything that suggests it's stored as a Merkle structure, or that its Merkle root could/would be included in checkpoints, which is what is needed for supporting fast sync.
Also @cca88 I don't get your comment about 14). Are you agreeing that different consistency conditions for different chaincodes could make sense and commenting on how this would be implemented?
@mm-hl: Right, this was confusing: I thought that optionally including the blockchain state also meant to include it inside 4.2 as part of the hash chain. (That hash chain now is built over the VL.) Will clarify this option.
14) Yes, agree. IMHO the design moves towards a transaction manager (TM) being implemented inside the logic for applying the updates; the TM ensurs a chosen consistency condition (isolation guarantee). The conditions couldn't be implemented by chaincode itself, only hard-coded ones in the fabric seem possible. Chaincode deployment can pick the isolation guarantee that it wants and which programmer understands. There is a line of literature that has investigated how to run a replicated DB without a central point of control (over a reliable group communication system, tolerating crashes only) by Kemme, Jimenez-Peris, Patino-Martinez, Alonso, Schiper and others. It is summarized in a textbook (Database Replication, Morgan & Claypool). When extending the approach here to full generality, it would become the equivalent of their "database replication" but in the BFT model.
Issue: committing peers can change the order of blobs from consensus.
The question of Byzantine committing peers was already raised. However, while it is true that a Byzantine committing peer cannot alter the content of a blob provided by the deliver() event of the consensus, it seems to me that it can change the order between blobs. That is, the deliver provides {sequence-num, blob, hash}; anyone can compute a new hash for a different {sequence-num', blob} pair. For the ordering provided by the consensus to be resistant to forgery it must be signed by some entity, not just hashed.
@tock-ibm
Byzantine committing peers can indeed do what they want with their raw ledger producing a hashchain that verifies but is bogus. However, this is not an issue, since such Byzantine peers cannot convince any honest peer of their chain. Namely, in the State transfer and checkpointing protocol (Sec 5) an honest peer gets the head (tip) of the raw ledger hashchain from the consensus service and then it resolves it back via peer-to-peer comm.
Now, whether consensus service delivers multi-signed batches to peers or it delivers via f+1 consenter confirmations - this is up to a consensus service implementation.
A subset of comments above is included in the new document revision. Notice that this does not mean that other comments will not be taken into account.
Notable change is that an endorsing peer now maintains the state. As a result, there are no committing peers any more, but simply peers, which can have additional roles of submitters and endorsers.
@vukolic transaction flow should be updated after this peer unification
These comments are based on this version.
27) Regarding the changed definition of submitting peers (Section 1.3.2.a), this appears to address the confidentiality issue in previous versions, because submitting peers accept transactions pertaining to specific chaincodes, so submitting peers for a given chaincode can be a subset of endorsers for that chaincode. However, it also puts the burden on clients of knowing which submitting peers can accept transactions for which chaincodes. This means that clients would need to manage issues like figuring out an alternative submitting peer in case the one they have previously remembered is unavailable or misbehaving.
An alternative would be to keep the previous structure of allowing any submitting peer to accept transactions for any chaincode, but to remove the responsibility it previously had to execute the chaincode. Instead, it could identify a "lead" endorser, which would execute the transaction, produce the proposed read/write sets, and request additional endorsers to verify correct execution and endorse the transaction. The submitting peer would only need to wait for a sufficient set of endorsements to show up. (What it does if they don't is a separate issue; maybe it times out and picks another lead endorser.) The submitting peer could inform the client which lead endorser was successfully used, allowing the client to "cache" the result for optimizing subsequent transactions for that chaincode, but would not require such caching or require it to be up to date or consistent. I think this arrangement would better fit the description of "providing an interface between clients and the blockchain", as it removes unnecessary burden from clients. (Also, it would make the "stateless" and "any peer" advertising in Section 1.3.i. more accurate.)
28) Section 1.3.ii seems repetitive coming so soon after the introductory material in Section 1.3. Furthermore, it isn't quite consistent with the earlier material. For example, the "pertaining to a particular chaincode" part of submitting peers that I addressed above is not reflected here.
29) In Section 1.3.iii, I think the specificity of "for transactions and state updates" is potentially misleading and confusing, given that the consenters themselves know nothing about the opaque "blobs" they are ordering.
30) In the second paragraph of the same section, "reliability" is mentioned, but what is described thereafter is really only about atomic broadcast. There is no stated or implied guarantee that all messages offered will eventually be delivered, or anything else that I would consider as "reliability" guarantees. Maybe just say "different implementations may offer different reliability guarantees" or something like that? (Now I see that this is addressed better in later text, so maybe just add "implementation-specific" here.)
31) Also, given that you're going to the trouble to accurately describe what the "consensus fabric" does (atomic broadcast), why not take the opportunity to excise the common abuse of the word "consensus", maybe putting a footnote to avoid confusion for people who are accustomed to (mis)using "consensus"? Maybe even go so far as to rename consenters and consensus fabric, for example to broadcasters and broadcast fabric? But maybe the terminological abuse is widespread enough that there is so no point resisting it.
32) I think the remark in Section 1.3.iii is too vague and speculative to be more than a distraction.
33) I think the Safety guarantee paragraphs would be improved if more precise properties were stated first, and then intuitive summaries and observations included later if they are still needed, rather than starting with imprecise descriptions and then refining them with "this means", "note that", "put differently", "in other words", etc. How about this (to encompass both safety guarantees):
There exists some sequence M consisting of messages m_i = (seqno_i, prevhash_i, blob_i), i=0,1,...:
What this doesn't address:
36) Regarding checkpoint validity policies, while it is alluded to in an example in a sub-bullet, I think it's worth pointing out explicitly that a weaker checkpoint validity policy (or combination of local checkpoint validity policies) can undermine fault tolerance properties because it may cause a correct peer to behave "incorrectly".
37) In Section 5.2.i, I suggest s/blocknohash/blockhash/g as the hash is for the block, not the block number, right?
38) I still don't find the checkpoint and state transfer description very convincing or compelling; the previous comments still mostly apply.
@mihaigh - fixed @mm-hl
27+28) The intention was to have submitting peer able to act on any transaction, except those pertaining to confidential chaincodes (that section will appear soon). The text now consistently reflects this.
As for the leader, I would like to treat this separately. After some discussions - and following your concerns as well as following concurrency and programming model impact of version dependencies raised by the community - the idea is as follows: offer both the option of leaderless verDeps (MVCC) as the text now stipulates + leader-based approach. Leader would be elected per chaincode, with the assistance of the consensus service.
While chaincode could implement its own leader election module on top of consensus (total order broadcast) - this may be sub optimal and very complicated to the chaincode developers, so the idea is to have the fabric support for few typical leader implementations.
Notice that chaincode could opt for leaderless variant (default) or some leader-election policy built in the fabric, or simply implement its own leader election.
Notice also that in such an approach, current leaderless MVCC is just a special case of leader function implementation, in which leader election chaincode deployed at every submitting peer would be
leader(chaincodeID) return myID
For actual leader implementation that leader implementation would look at the blockchain state to determine the current leader and act appropriately (per leader election policy/chaincode) when necessary to help elect the leader.
I will work out the first draft along these lines and submit - so we can discuss further.
29-32) fixed
33-35) Agree, this is TBD properly. Notice that 34 and 35 are addressed in prose already.
36-38) Changes are pending to the checkpointing mechanism to include the state hash. Will post when ready for review.
@vukolic thanks for the responses.
Regarding 27+28, please note that I was not suggesting a different or additional approach to concurrency, which I agree is a separate topic. My point was only regarding whether a submitting peer could accept a transaction for a confidential chaincode for which it does not have the ability to execute the transaction. The key part of the suggestion is: identify a "lead" endorser, which would execute the transaction, produce the proposed read/write sets, and request additional endorsers to verify correct execution and endorse the transaction. All of that assumes the MVCC model, and just addresses who has the ability to execute and prepare the MVCC-style transaction.
Although I used the word "lead" in "lead endorser", the suggestion does not entail or require any form of agreement or election, and thus none of the complexity you discussed in your response. The "lead endorser" could instead be called the "initiating endorser" and could be chosen at random or via any other policy. A different choice could be made per transaction. Indeed, a submitting peer could even choose to send the "initiation request" to multiple endorsers, again chosen randomly or otherwise, and then wait for a sufficient set of endorsement responses to satisfy the endorsement policy, which might or might not all derive from the same "initiating endorser".
The main change I am suggesting is regarding step 2.2 in the transaction flow: The submitting peer prepares a transaction and sends it to endorsers for obtaining an endorsement. This is what puts the burden on submitting clients of knowing which submitting peers to target for which chaincodes. It sounds like you're preparing more detailed treatment wrt confidential chaincodes, so I will try to review again when that appears.
Thanks for the good work!
@mm-hl
ack - the incubating confidentiality draft is precisely doing sth similar - that one will appear for review/comments/discussion soon (will of course post a notice here).
This does not change the story re leader in the context of concurrency as discussed in my previous comment. We agree this is a separate issue and will be treated in the separate section of the proposal - so we can discuss it then in more details.
I wanted to discuss here on the aspect that causes most departure from the current architecture - which is transforming transaction from the original form (issued by a user - invoke a chaincode method) into a form that includes state updates. Because, this is what requires new functions such as endorsers and committers, new artifacts such as read-write set/verDep, and also changes the transaction execution flow significantly. At the first sight, to me it appears that the main motivation is the Confidentiality as the other advantages such as separating the consenter nodes are more about separating the functions performed by a single peer in the current architecture. So, is the main motivation to allow deployment of a chaincode to specific set of peers and still to maintain a single global ledger (transactions and state)? If yes, then why not instead have a separate ledger altogether for each such confidential chaincode (or trust group). The peer that does not have the chaincode anyway can not operate on the corresponding state and moreover it may be be confusing to allow the unrelated peer the access to state data but not the chaincode. Allowing non-determinism in the chaincode may be another side-advantage but I am not certain whether that could be the main motivation. Similarly, agreeing on state updates beforehand could be another but I guess checkpointing also serves the similar purpose.
So, basically, I wanted to find out the main motivation of including the state updates in the transaction definition which causes a significant change in flow.
@manish-sethi
The high level motivation is to simplify fabric implementaiton. This motivation is broken down as follows (w/o specific order): 1) handling non-determinism, 2) allowing more parallelism in chaincode execution (endorsement), 3) providing a simple mechanism of ensuring that a transaction is never executed more than once.
1) Handling non-determinism is an extremely important motivation, esp. when we replicate code coded in high level language such as golang/Java/etc. With model w/o state updates (taken, e.g., by the Sieve protocol - http://arxiv.org/abs/1603.07351 - which is the prototype in the current HL fabric) there is always a case in which a non-deterministic would appear as deterministic during execution and, as such, committed to the ledger. However if one only logs the transaction payload of such a transaction and not its state (updates), a replica repeating the transaction execution sequence (e.g., a new peer) could end up executing the transaction payload and end up in a divergent state (due to non-determinism). This can be of course handled through state transfer instead of sequential execution - but it complicates the system. Hence the choice for state updates which yield a simpler (fabric implementation-wise) way to deal with non-determinism at the fabric level.
(as a side note: state updates are directly applicable to Sieve as well, and would turn Sieve into leader-based protocol for handling non-determinism with state updates)
2) As for parallel execution, state updates + version dependencies are nice, since they allow leader-free parallel execution. This is not the only way to implement parallel execution (which can be also leader based, cf. Eve - OSDI'12) but again, it appears as a simpler one.
3) Further motivation is to simplify the implementation with respect to transaction execution as with state updates (and version dependencies) one cannot "accidently" execute the same transaction twice (see ZAB paper on Zookeper atomic broadcast by Junqueira et al. DSN'11 for discussion on this one).
@vukolic thanks for highlighting the though process behind the proposal. To discuss these further, can you have a look at the following.
1) For the case of a new peer joining, we can always include the state updates produced by the transaction payload into the block when we execute the transactions in the current architecture. In other words, a block in the blockchain can still look same as it appears in the endorsement based model. However, across live replicas we would rely upon the checkpointing which I think is the case even in the endorsement based model (please correct me if I am wrong here).
2) About parallelism, the proposed approach primarily tries to compensate the cost of grpc communication between a chaincode and peer (assuming these are not high CPU consuming transactions) at the cost of additional overheads which include signing and collecting the endorsements, performing the additional ledger reads (one during simulation and another during commit - for version matching), roll backs in the case of version conflicts. I am sure that these overheads may be compensated for large networks where endorsement is required from a significantly smaller subset of peers. However, for small-sized networks where a larger subset is involved in the endorsement, the final performance may be poorer because of these overheads. I am just concerned and wanted to know whether we have some measurements of these costs and some sense at what configurations (e.g., network size and ratio of endorsers to the network size) they start showing benefits.
2) (a) In past, I was thinking of a simple approach for allowing parallelism in the current code base. Let me write it briefly here and run this by you to see if this make sense. Below is a rough functioning of this
Execute transactions in a block in parallel and maintain a map {k -> TxId} where a transaction TxId
reads / modifies the key k
. If there is no conflict (against each key only a single TxID entry is present) at the end of batch execution, commit the transaction results. In the case of higher number of conflicts, fall-back to sequential execution. Further, if there are a smaller number of conflicts some optimization can be applied e.g., rolling back and re-executing conflicting transactions in the order of relative order they appear in the block (there are corner cases here but I think - not critical to discuss right now). Let me know if you think that this could be made leader-free because of dynamically monitoring the conflicts and relying on checkpointing at a later stage? Upto a certain network size, some simple parallelism like this may result in better performance. But I am not sure if both the approaches can be merged in a single architecture and employ one vs the other based on deployments.
As a side note, I believe that the above mentioned approach of dynamically monitoring is orthogonal and could be included in the endorsement based architecture as well where an endorser collects and executes the the endorsement requests in parallel.
3) I haven't read this paper but did not understand the comment of executing transaction twice by mistake. Is it just about adding a check of looking up txid in the blockchain before executing again or something more than that?
@manish-sethi Many thanks for your discussion.
1) We can add state updates to the current architecture yes, and also to Sieve way of handling non-determinism. In fact the proposal here is a generalized Sieve, where state-updates are moved around instead of transaction payload (this is rather straightforward to add to Sieve) and we do not have a leader any more (this is more elaborate - hence the version numbers). BTW, I am not sure I understood what you are aiming at here 100%, so I might not be replying to the question.
2) you may be right of slight overhead (higher latency) when you think of a single chaincode. However, even with a single chaincode - separation of concerns allows more parallelism in execution, as you sending money to some account could be endorsed by you only and me sending money within the same chaincode could be endorsed by me only. So in this case, we have parallel execution whereas in the current architecture we would have a sequential one. So hence the speed up (in throughput) even for single chaincode. The way I see it, we have a much more pronounced problem today with HL fabric throughput than with its latency. Also, for a majority of blockchain applications, throughput will matter, so long as latency is reasonable.
Furthermore, if a single chaincode is doing things sequentially, there is not much help to its performance even with the current architecture. However, with the current architecture of HL fabric, such slow execution of a single chaincode would impact the performance of the entire blockchain incl. chaincodes that may be executing/endorsing much faster - because of global sequential execution. Hence, the partitioning of endorsers in this architecture is intuitively better for combining multiple chaincodes in a single ledger.
2a) What you describe is exactly the approach of Eve, OSDI'12 I was referring to. Regarding leader-less paralelism vs leader-based parallelism there is a tradeoff with respect to granularity of state that concurrent transactions touch. For fine-grained data models (such as UTXO) we can have a lot of paralelism with leaderless approach. For more coarse grained objects, we may want a leader - to, as you say - employ one vs the other in different chaincodes. In this context please look at my answer above to mm-hl (27+28). This "best of both worlds" of leader-less vs. leader-based approach is certainly to be added to this design document.
3) This is not the main motivation, but the approach in principle allows for a consensus service that would deliver a transaction more than once as in "at least once semantics" rather than "exactly once semantics" (although the current specification does not allow this). Namely, even if consensus service would have "at least once semantics" - because of version dependencies such a transaction would never be executed more than once (it is idempotent). ZAB paper has more discussion on this arguing that "at least once" is simpler to implement than "exactly once".
I have not been able to go through the entire discussion here, but wanted to share initial thoughts at this point.
initial remarks on the text, not content: 1- For a reader, it would be easier if the text was structured to first describe shortly the current architecture, its issues/limitations and then embark on describing the proposed architecture and how it addresses the issues/limitations. 2- some terms are used with various semantics attached to it, eg. blockchain: "consent on the blockchain order" (incorrect use?) vs. "deployment chaincodes whose state is part of the blockchain but..." (?) vs. "The blockchain is a distributed system consisting of many nodes that communicate with each other" vs. " The communicated messages are the candidate transactions for inclusion in the blockchain." vs. "a peer appends the transaction to the log (blockchain) and..."
on the content: 1- section 3.1 & 3.2. While looking at this from a use case requiring the trade content and even trading patterns to be only exposed to the trade stakeholders: how would the endorsement policy be able to support such a pattern? If the data read and produced by the transaction (e.g. the trade details) is only shared between the stakeholders (therefore stored off-chain) while the hash/signature is stored on the (raw) ledger. The hash/signature stored on the (raw) ledger should not allow the identity of the involved stakeholders to be revealed except tot eh stakeholders. If a endorsement policy defines the endorser set as the stakeholders as identified in the transaction itself, would this support the endorsers to remain anonymous within the system but establish the required trust (identity) to each other? Can the (certificates to create the ) signatures mentioned in section 3.2 be such that they support this? 2- section 2.3; "Alternative design: An endorsing peer may omit to inform the submitting peer about an invalid transaction altogether, without sending explicit TRANSACTION-INVALID notifications." Note this would require an algorithm to determine a dead node, define a time-out which also avoids flooding the peers with PROPOSE messages while guaranteeing system responsiveness. 3- section 2.4 paragraph 2; note this paragraph does not distinguish behaviour in case of error conditions. E.g. if the STALE_VERSION answer comes back it does make sense to let the submitting peer start from step 2.2 again. 4- section 4.1. It would clarify if it stated explicitly a batch can be configured to have a max. size (max. number of transactions in it) of 1 or more. Note 1 would basically disable batching.
@vukolic thanks for your detailed reply.
1) I am just trying to weigh the pro and cons of the endorsement based approach to the other possible alternates. As you agree that a new peer can join the network in the same manner (i.e., by tranferring blocks and state updates) even in the current approach so, the meaningful difference is in handling non-determinism between checkpoints. In the current approach, you would execute transactions first and then agree upon and in the proposed approach you would agree first and then only call it a valid transaction. I agree with you on this point that the later is less complex to implement but not sure about the added complexity because of policies etc.
However, because it changes the transaction definition (since it now needs to include the versioning details) I am not sure whether this could have an undesired implication just want to discuss with the help of an example - could it open the possibilities of deliberate attempt to invalidate transactions so they never appear on blockchain? For example, if a transaction transfers some assets from A to B and it requires endorsement from both A and B. In the current architecture, the transaction appears on the block and if the execution results differ at A and B's nodes, it can only be because of either non-determinism of the chaincode or malicious behaviour; but in no circumstances the functioning of a correct fabric. In the endorsement model, A may never endorse intentionally and the transaction may never reach the consenters because of insufficient endorsements. Even if you allow the transactions to be included without sufficient endorsements (not for committing but just for recording), a meaningful validation is hard (because the rejection cause can always be cited as version dependency mismatch during transaction simulation which is functioning of the fabric).
2) I think that you got me wrong here. I was not discussing the parallelism that the endorsement model offers to the current codebase of sequential execution. Rather, again, I simply wanted to weigh the pro and cons of other possible alternatives for enabling parallelism. Sorry, if my text was confusing. It's good if you think that both the approaches (Eve based and endorsement based) have their place and you intend to have them in the architecture. However, more than corse-grain or fine-grain I had a different dimension in mind based on resource consumption. Let me explain that here. The execution of a transaction mainly involves executing chaincode (cpu + grpc communication cost) and disk access cost (dominated by random reads of keys by chaincode) Now, in Eve based execution, each transaction executes once at each node (Though it's a parallel execution but still each transaction's full execution happens at each node). In endorsement based approach, a part of the above mentioned cost (i.e., the disk access cost) is caused at each node and the other costs (cpu + grpc cost) happens at a subset of nodes (endorsers) but with additional overhead of signing, additional disk costs during transaction simulation and larger payload etc. And I was highlighting that the ratio of average size of endorsers-set to the network size would probably be a deriving factor for benefiting/loosing performance from endorsement based approach in comparison to the Eve based approach. So, when I referred to the additional cost, I was not referring to the added latency but I was referring to the fact that since in some settings this cost would hamper the throughput in comparison to the Eve-based parallelism. Finally, what are your thoughts on an orthogonal dimension related to parallelism where we may not maintain a global ledger at all and let all the components (blockchain/ledger/consensus) be there separately for separate trust groups (say for a each chaincode in a simple setting) and different trust groups run in parallelism. I am just wondering the value of the transactions and data of a chaincode lying on my ledger when I do not have the chaincode and I cannot operate on that data.
I have to say that this is a rather clumsy approach to collaborative editing and discussion of a document. the mailing list or a googledoc would be better suited to this task. I wanted to make some editorial edits but thought better to send a marked up version in Word since the wiki isn't the best tool for this.
Some general comments: 1 - the document is rather inconsistent in its use of terms. 'consensus service' and 'consensus fabric' seem to be interchangeable. Pick one. 2 - this is really pub/sub not broadcast. I also found it awkward that we say that we can partition ala topics on pub/sub then proceed to assume there is but a single channel (broadcast). It seems to me that we should preserve the pub/sub notion throughout (note, I am a big proponent of pub/sub and have implemented a global scale pub/sub fabric/substrate for Sun a lifetime ago). I say this because the paper omits the whole subject of subscribing and substitutes "connects to the channel provided by the consensus fabric". In fact, we are connecting wth the consensus fabric and re-establishing our subscriptions and (re)identifying the topic(s) that we might publish if we want to be comprehensive in describing what is going on.
broadcast
but publish
. We aren't broadcasting, we are publishing to a topic/channel. A true broadcast would go places the message wasn't necessarily wanted. I would assert that while a very simplistic implementation might send the message to every node, this doesn't scale.@christo4ferris:
Editing - yes, but this is more like code, one can't simply rewrite a paragraph without considering the rest. And, couldn't you edit the source directly via git?
2 - this is really pub/sub not broadcast
Let me disagree, I am actually in favor of dropping the "pub/sub" terminology. Pub/sub is relevant when you can subscribe to multiple "topics" and usually doesn't care much about strict ordering or delivery guarantees. The design here talks only about "topic" = one blockchain. On the side, the text says this could be pub/sub with multiple channels, but actually offering that (your comments 4-6) will need a deep technical discussion on how to order transactions of different channels w.r.t. each other. We aren't there yet.
What matters most is "consensus" -- the promise that everyone receives the same transactions in the model where there is one blockchain only. That term has been picked up everywhere for this feature of blockchains. Technically, in the relevant literature that I cite from and have contributed to, the appropriate term for it is "atomic broadcast" or "total-order broadcast" because it implies agreement on an ever-growing sequence of messages with transactions. Calling this "consensus" is slightly problematic because "consensus" also means the single-instance primitive, the one where the system ever only agrees once; but this confusion is all over the literature, hence people should look deep enough to understand the difference between, say, "paxos" and "multi-paxos". Hence I can also live with calling it "consensus service".
I suggest we build this first with one "channel" as discussed here, and once there, move to a "multichannel" design and call it "pub/sub service" then.
I can, but it felt awkward just making edits directly without discussion... a PR is a different beast but with a document, there is an editor and contributors and from my experience, googledocs has it about right for collaborative editing of a document allowing in-line comments and suggested edits. Cheers,Christopher FerrisIBM Distinguished Engineer, CTO Open TechnologyIBM Cloud, Open Technologiesemail: chrisfer@us.ibm.comtwitter: @christo4ferrisblog: https://developer.ibm.com/opentech/author/chrisfer/phone: +1 508 667 0402 ----- Original message -----From: cca88 notifications@github.comTo: hyperledger/fabric fabric@noreply.github.comCc: Christopher B Ferris/Waltham/IBM@IBMUS, Mention mention@noreply.github.comSubject: Re: [hyperledger/fabric] Next Consensus Architecture discussion (#1631)Date: Sun, Jun 19, 2016 7:24 AM @christo4ferris: Editing - yes, but this is more like code, one can't simply rewrite a paragraph without considering the rest. And, couldn't you edit the source directly via git? 2 - this is really pub/sub not broadcast Let me disagree, I am actually in favor of dropping the "pub/sub" terminology. Pub/sub is relevant when you can subscribe to multiple "topics" and usually doesn't care much about strict ordering or delivery guarantees. The design here talks only about "topic" = one blockchain. On the side, the text says this could be pub/sub with multiple channels, but actually offering that (your comments 4-6) will need a deep technical discussion on how to order transactions of different channels w.r.t. each other. We aren't there yet. What matters most is "consensus" -- the promise that everyone receives the same transactions in the model where there is one blockchain only. That term has been picked up everywhere for this feature of blockchains. Technically, in the relevant literature that I cite from and have contributed to, the appropriate term for it is "atomic broadcast" or "total-order broadcast" because it implies agreement on an ever-growing sequence of messages with transactions. Calling this "consensus" is slightly problematic because "consensus" also means the single-instance primitive, the one where the system ever only agrees once; but this confusion is all over the literature, hence people should look deep enough to understand the difference between, say, "paxos" and "multi-paxos". Hence I can also live with calling it "consensus service". I suggest we build this first with one "channel" as discussed here, and once there, move to a "multichannel" design and call it "pub/sub service" then. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
@vukolic I intended to say something in earlier discussion that is close to what I interpret @manish-sethi to be saying in part 2a, and I don't agree with you that it is exactly the approach of Eve, OSDI 12.
My interpretation of what @manish-sethi said is that transactions in a block should be executed "as if" they were executed in sequential order. In Eve, transactions are executed in parallel without such a requirement, allowing for the possibility that different peers get different results. Then, further mechanisms are required to cope with such divergence (the Verification stage), and additional measures are needed to make (excessive) divergence less likely (the Mixer).
If transactions in a block should be executed "as if" in order, then optimistic techniques, such as the one @manish-sethi sketches and others including transactional memory, can be used, and if they are not successful, then we will fall back to a less optimistic method (maybe eventually to fully pessimistic), but we will not complicate the rest of the protocol by allowing the divergence to escape from the local execution. For use cases in which conflicts are rare or nonexistent (perhaps due to successful "mixing" or just because of the nature of the use case), the overhead should be quite low and the parallelisation profitable. When conflicts are more common, it's not clear to me that allowing the divergence to propagate is a win.
To put this in context, let me repeat that I'm unconvinced by the value of and motivation for supporting nondeterministic chaincodes. Saying that it's useful if we write chaincodes in languages such as Go and Java isn't motivation for nondeterministic transactions, it's motivation for finding better ways of expressing chaincodes/smart contracts. The Eve approach introduces nondeterminism even if there is none to start with, thus creating a problem that then needs to be solved. Unless and until someone convinces me otherwise, I think chaincodes should be deterministic, preferably (eventually) enforced by the language or whatever is used to express them and/or formal methods, so I don't think we should prefer mechanisms that introduce nondeterminism, and we should not justify their need by an assumption that chaincodes will be nondeterministic either. If we didn't feel the need for better approaches to expressing chaincodes last week, we certainly should this week :smile:.
@mm-hl @manish-sethi There are at least two things here: 1) @manish-sethi 2a vs Eve and 2) non-deterministic chaincodes. Let me tackle these separately.
1) I re-read @manish-sethi's 2a) and I see only that it is actually a special case of Eve - as parallel execution succeeds only "If there is no conflict (against each key only a single TxID entry is present)" - wheres Eve could make it in executing in parallel, sometimes, if mixer does a good job partitioning requests, even if more than one txId per key is present. Both approaches fall back to sequential if this is not the case. Notice that Eve's mixer can be optimized if it knows exactly which object a tx modifies - which is the information @manish-sethi's 2a) apparently has - in which case it could be made in a way that never makes mistakes - and in that case it would never produce divergent results - so the subsequent complexity could be avoided. Yet, Eve does not have this assumption - so it is more complex.
2) As for non-deterministic chaincodes. I fully agree that in the ideal world these must never come to fabric as non-deterministic. But we are simply not there yet, as HL (fabric) currently does not have a DSL that would dissalow non-determinism (I do agree, fully, that in HLP we do need such DSLs). Yet, until that time comes, fabric needs to ensure it protects against trivial DoS in which somebody deploys non-deterministic chaincode, issues a non-det tx and puts the peers in divergent state.
As a side note, cf. last week's events, it appears to me that the smart contract that caused those issues was in fact deterministic - but simply not well understood by its designers/developers. Namely, every time that "attacking" sub-contract would be executed on whatever peer - it would produce same results - so it is deterministic.
@manish-sethi
Finally, what are your thoughts on an orthogonal dimension related to parallelism where we may not maintain a global ledger at all and let all the components (blockchain/ledger/consensus) be there separately for separate trust groups (say for a each chaincode in a simple setting) and different trust groups run in parallelism.
Sharding is certainly a technique we can apply to the fabric - and the plan is to eventually do so in HL fabric. The way I see things, doing simple partitioning is trivial - and we can easily do it. Yet, sooner or later, one would need to come up with some semantics/design to support cross-partition transactions (i.e., cross-subchain transactions) which is non-trivial. W/o cross-partition txs, we can very well have paralelism you mention, in both current HL fabric architecture and the next proposed one (i.e., this one)
@JoVerwimp Thanks for your comments (apologies for higher latency)
non-content comments
content:
Thanks @vukolic (attn @elli-androulaki) on content point 1, not exposing the stakeholder's identity, is there a possibility of using (something similar to) transaction certificates for producing the endorsement signature?
The endorser would get a 'transaction certificate' from the CA, which later allows for validation of the signature without exposing the identity.
minor revision posted addressing some terminology consistency comments from @christo4ferris and @JoVerwimp, comments 6, 13 and 14 from @mm-hl and few other minor changes.
@vukolic regarding @manish-sethi's 2a), you point out that both it and Eve "fall back to sequential" in case there are conflicts. Right, but these happen at quite different levels (according to my interpretation of @manish-sethi's comments, which admittedly may be colored by my own thoughts in similar directions). With Eve, if the conflicts results, then a higher-level protocol tries to choose one of them and if this fails, falls back to sequential execution. In contrast, in the "2a" idea, the "falling back to sequential" happens within each peer that encounters the conflicts, so the end result is that all honest peers determine the same result (some may have succeeded with concurrency optimizations while others may have fallen back to sequential). Thus, there is no allowed divergence, thus no introducing nondeterminism that wasn't already there and a simpler protocol because all honest peers get the same result for each block, even if executed in parallel.
Regarding smart contract languages, nondeterminism, etc., seems like we're on the same "ideal" page. I did not mean to suggest that nondeterminism was directly to blame (and agree it wasn't) but rather to point out that the recent events show very clearly that we need better language support for smart contracts, so it bothers me a bit to see design directions seemingly being pursued that bake in some of what I view as harmful aspects of the current pragmatic choices that should preferably be eliminated in time.
@mm-hl yes, your interpretation is correct of what I had in mind while writing "2a" (i.e., the execution at each node is independent) - Though, I am not sure about - execution on one node not observing any conflict while at some other node, conflicts were observed - (assuming all transactions processed on committed state till last block commit). In fact, they would observe same conflicts because - A chaincode would read/write same data if executed against same state (assuming deterministic code). However, would like to elaborate further a bit on falling back to the sequential execution. If there are a smaller number of conflicts, the approach was to roll back and re-execute only conflicting transactions in the order of relative order they appear in the block. (This requires monitoring for a fresh conflicts with earlier non-conflicting transactions - hopefully a rare situation but still required for correctness). So, falling back to sequential execution of whole block was a worst case choice based on thresholds - 1) number of conflicts in the first round 2) number of iterations that produce a fresh conflicts when re-executing only conflicting transactions.
Section2.5, 2nd paragraph: Reading the description I'm wondering if proposed method covers all possible isolation levels.
If I consider a scenario where two transactions (A , B) are entered and part of the chaincode business logic is to see if the other transaction exists and if it is to mark each as matched. The chain code is implemented by storing each of A and B as a key value pair with keys A and B.
When item A executes then it searches for B but does not find it. If item B executes then it searches for A.
If the consensus model only consider the read variable and the updated variables then if A and B each executed on versions of the world state that don’t contain the applied state of the other transaction, and then the state updates are combined in the same block, it is possible for both A and B will be entered into the system in an unmatched state.
The problem is clearly the “search” for items that don’t yet exist / are not committed.
One solution would be to partially serialise execution. So in the example above then it might be possible to insist that these transactions are executed serially on the same endorsing peer, however this would need endorsing peers to execute subsequent transactions against the proposed world state rather than against the committed world state.
It this the proposal in 2.3?
Another solution would be for the application to store A and B under the same key, say AB, however this imposes restrictions on the programing data model and moves to a model of few keys with large blobs of data containing internal structure, which may itself introduce performance problems.
A final solution might be to record search criteria along with keys read or written and then in the consensus model to see if any keys have been entered into the range search between execution and commit.
@roger505 I am not sure I understand the issue
your "chaincode" seems to have following two tx A) A=1; if B==\bot then executeSomeWinnerCode B) B=1; if A==\bot then executeSomeWinnerCode i.e., if they execute concurrently on the initial state we have A.readset={(B,\bot)}, A.writeset={(A,\bot)}, A.stateUpdate={(A,1)} B.readset={(A,\bot)}, B.writeset={(B,\bot)}, B.stateUpdate={(B,1)}
as consensus orders one transaction before the other, the first transaction (per raw ledger order) is valid and committed (regardless of isolation level). Notice that there is an order among tx in a batch, even if they belong to the same batch.
In case of serializability the second (per raw ledger order) tx is invalid. Under some other isolation level (e.g., SI) the second tx could be valid as well.
let me know if I am missing sth
@manish-sethi yes, there could be many strategies for optimizing execution of a batch of transactions. The only requirement would be that they are executed "as if" sequentially, which is what's required to avoid the complexity (e.g., in Eve) that arises from multiple honest peers getting different results for the same batch.
@vukolic I agree that the example you quote will be safe, however in the use case I was thinking of the matching is done with a range search rather than an explicit read as there is a "matching tolerance". How would the code represent a "RangeQueryState" where no key is found ? I agree that in some isolation levels this would not be protected against, I was just keen understand if in this design it is possible to configure the system so it is safe , even if that is at the expense of some limitations such as forcing potential problem transactions to have the same endorsers.
My thoughts - would it be good to consider that Identities involved in transaction be verified during this process? If consensus and endorsement is done on transactions, the transactors involved should also be verified in a regulated environment. Endorsing the "content" of the transaction may be only a part of the solution where endorsers would also need to validate the Identities; just as an easy example: OFAC checks
Is it possible to explain the confidentiality model further please? I notice that the tran-proposal contains the spID and the clientID, and that the tran-proposal is included in a blob, and the blob is included in the chain.
Would the spID and clientID not reveal something about the submitter of the transaction?
How does this fit with transaction certificates to obscure the origin of the transaction?
@roger505 more details on the confidentiality are following... Pls stay tuned.
UPDATE: In the meantime clientID is gone as actually not necessary. In principle spID should not reveal anything with respect to the client nor leak any useful info. It is however useful, as it denotes a peer who computed transaction results and may reveal useful for accountability (e.g., peer submitting transactions with invalid endorsements) as well as to help uniquely identify a transaction.
Use the comment field to discuss the next consensus architecture proposal.