hyperledger-archives / fabric

THIS IS A READ-ONLY historic repository. Current development is at https://gerrit.hyperledger.org/r/#/admin/projects/fabric . pull requests not accepted
https://gerrit.hyperledger.org/
Apache License 2.0
1.17k stars 1.01k forks source link

Next Consensus Architecture discussion #1631

Open vukolic opened 8 years ago

vukolic commented 8 years ago

Use the comment field to discuss the next consensus architecture proposal.

gregmisiorek commented 8 years ago

for section 2.1, we should try to use a UTC timestamp of possible: tsUTC.

gregmisiorek commented 8 years ago

for section 2.2, 1st alternative design, HASH(txPayload) seems preferable as we should try to avoid clear text wherever we can without losing the clarity of the specification.

adecaro commented 8 years ago

Just some clarifications: In Section 1.2.i, shall 'Clients create and thereby invoke transactions.' be '' Clients create and thereby invoke chaincodes'? In Section 2.1, shall 'To invoke a transaction' be 'To invoke a chaincode'?

binhn commented 8 years ago

@gregmisiorek yes, hash would be preferred, especially in the case of confidential transactions. @adecaro at this level, we should use transaction instead of chaincode to be more generic.

bmandler commented 8 years ago

A couple of comments / thoughts

  1. The struggle to keep a blurry distinction between the different kinds of peers seems to add unnecessary confusion without having a real need (examples of how this complicates the text can be seen in the last paragraph of section 1.2: "Nodes other than committing ..., and section 2.3 first bullet, second paragraph). For simplicity I'd unify these entities and state that there is an entity called peer and it has 3 distinct capabilities, namely endorser, submitter, and committer. To my understanding all peers can play all roles and their specific role is determined per chaincode (so a peer can be endorser for one chaincode, and a submitter for another, but I think that all need to be committers). Since I think that all peers need to maintain the state and all peers should be able to communicate with clients (see 1.2.i: the client may connect to any peer of its choice), I don't see a real value in the separation. • I believe an endorsing peer has to maintain the state as well, first because it may very well be needed for endorsements, and second because it's required for simulating the transaction execution (and is required to respond to a query request). • In section 1.2.ii.a it's stated that a committing peer may give clients read access to the blockchain and state. Once again. It means that all committing peers are submitting peers; and in addition if read access to the state means responding to chaincode queries, than only the endorsers can respond (and potentially the submitting peer), since they are the only ones that have the actual chaincode in question.
  2. The flow of deploying a transaction is not detailed, I assume the deployment takes place only on the endorsing peers specified by the endorsement policy and on the submitting peer. What happens if a following transaction invocation for the same chaincode comes through a different submitting peer? That peer may not have the chaincode deployed and thus won't be able to simulate the execution of the transaction. • It is stated that deploy transactions are controlled by the system chaincode rather than the endorsement policy of the specific chaincode being installed. Why is that?
  3. The flow of a query request is not detailed. Is it serviced by the submitting peer? Does the submitting peer involve the endorsers? Does it go through the consenters at all?
  4. Section 2.3 second bullet b (STALE_VERSION): if the submitting peer receives back such an answer it can drop the transaction or restart the process from the beginning rather than send this to the consenters even if the endorsement criteria was obtained, since it will fail at commit time. Naturally this gets complicated if taking into account that an endorser can be Byzantine.
  5. All peers need access to the metadata of all chaincodes (to verify the endorsement), and require access to all the endorsement policies of all chaincodes. This is distributed to all peers at chaincode deploy time?
  6. Confidential chaincode is supported by having the chaincode deployed only on the endorsers. The assumption is that there are no Byzantine endorsers? Otherwise, a Byzantine endorser can do anything with the chaincode, and communicate it to whoever he wants.
  7. How is a Byzantine submitting / committing peer handled?
  8. In the description of the raw ledger there's the following sentence which I don't understand: It also provides submitting peer with information about invalid transactions ...
  9. As for Binh's earlier remark on non determinism; this is captured by the endorsers and reported back to the submitting peer (INCORRECT_STATE), so the submitting peer can take an appropriate action based on that response. Personally, I'd vote for dropping such a transaction since I don't see the value of supporting non deterministic transactions within a blockchain. The problem that arises than of course is how to handle a Byzantine endorser that replies with such an error.
vukolic commented 8 years ago

@bmandler Many thanks for very insightful comments! Let me respond quickly - but addressing fully requires more work as discussed below: 1) Good point - and this was originally the case in the design, until there was a remark that endorsers may be more general as peers that do not necessarily maintain the state. But in principle, I agree - that we could simplify + endorser will most of the time be actually a committing peer. 2) The idea is that transaction deployment are seen as an invocation of system chaincode, where a certain set of (system chaincode) endorsers may need to endorse that a transaction is deployed. I again agree, the flow of the deploy is todo. 3) There was already a comment re this from @christo4ferris - will be added (todo). 4) In principle, this would be an implementation optimization (there will be many of these). But, as you correctly point out - depending on the trust model in the endorsers of this chaincode - this optimization may be applicable only when a certain number (or set) of endorser reply with STALE_VERSION (cf Byzantine endorsers). I suggest to address this by adding a comment along these lines to the design draft. 5) This is why deployment might need to go to all committing peers not just to endorsers as you suggested in 2) - this will need to be made clear when Deploy flow is outlined. 6) With the confidentiality as enabled by this design, confidentiality would be violated with Byzantine endorsers specified by the chaincode. Notice that the integrity guarantees of such a chaincode execution could be maintained despite confidentiality violatoins. If this level of confidentiality is insufficient, chaincode can/should resort to application (chaincode) level confidentiality techniques (which the design does not prevent). 7) Dealing with submitting peer submitting a tx with wrong endorsement - is handled with endorser signatures. As for interactions with consensus - it is the job of the consensus module to deal with Byz submitting peers (as Byz consensus clients) and Byz committing peers. Byz committing peers are simply learners in Paxos wording so it is easy to tolerate any number of Byz committing peers from consensus perspective. Dealing with Byz committing peers for queries will need to be spelled out - when we spell out queries. 8) This means that a submitting peer, acting as a committing one will be able to deduce an invalid tx and act upon it as agreed w. client (cf retryFlag, etc.)

BTW, if you want to volunteer for writing one of these TODOs (e.g., deployment/queries/anything else) - pls ping me so we coordinate. Thanks again!

mihaigh commented 8 years ago

Typos:

  1. on section 1.2.ii. point b it is written "on behalf one or more clients for executing the transaction." instead of "on behalf of one or more clients for executing the transaction."
  2. on section 1.2.iii. it is written "and can then send messages and obtain see messages that arrive" instead of "and can then send messages and obtain the messages that arrive"
mm-hl commented 8 years ago

Hi guys

Thanks for the good work on the Next Consensus Architecture Proposal. I like the general direction of moving away from a monolithic architecture, allowing for more flexibility and separation of concerns. I have a bunch of comments at varying levels of detail. They are listed more or less in order of the relevant parts of the proposal. These comments were written while reading this version. Please excuse the ugly long numbered list but this may facilitate easier referencing for further discussion

  1. The name "submitting peer" seems potentially confusing, because it is in some sense the client that "submits" the transaction. (Note who's doing the submitting in this sentence: "A submitting peer is a committing peer to which a client can directly connect to submit transactions, providing an interface between clients and the blockchain.")
  2. Perhaps the term "accepting peer" (or just "acceptor") would be useful, as it accepts transactions that clients submit.
  3. In Section 1.2, this seems contradictory and is at least confusing: "All peers usually maintain the state, except for endorsers. In a typical deployment endorsers will always be committing peers as well and, hence, maintain the state." (due to use of "usual" and "typically" for two sentences that contradict each other ).
  4. While the comments regarding nondeterminism at the beginning of Section 2 make sense, it's unclear to me that it would ever make sense to trust submitting nodes not to be byzantine while still bothering to have all his other blockchain infrastructure. Unless there is a sensible such scenario, the insistence that determinism isn't required seems to merit a technical footnote at most, not an a high-level, introductory comment that makes it sound like there is some significant advantage due to not requiring determinism. (I'm yet to be convinced that nondeterministic chaincodes are worth the substantial trouble they bring, nor that there are convincing solutions to support them; happy to be convinced otherwise.)
  5. In Section 2.2, there is no discussion of the motivation to convert the transaction into a state update and dependence version representations. As noted below, there are some downsides to this, and the upside is not explicitly stated, making it difficult to weigh the perceived pros and cons.
  6. It is unclear exactly what is state update and exactly what is verdep. One can read between the lines and figure out what is probably intended, but better to be explicitly clear.
  7. What would be the point of having a tran-proposal contain a hash of the txPayload? It provides limited transaction confidentiality, given that state update and version dependencies are included anyway, and it doesn't quite guarantee unique transaction IDs either, given that some transactions might not update any state.
  8. Is it a conscious choice to have some signatures cover the message/statement type (e.g., TRANSACTION-[IN]VALID) and some not (e.g., PROPOSE, SUBMIT)?
  9. The alternative design of having the consensus fabric do the endorsing as well seems like it would not maintain a clean separation of concerns, and also seems that it may undermine one of the stated advantages of having endorsers in the first place, namely improving scalability by allowing different sets of nodes to be endorsers for different chaincodes.
  10. In Section 2.3, would it make sense to allow more generality for rejection reasons, particularly to distinguish between error conditions and policy-based rejection?
  11. Also, many/all of the design alternatives mentioned in Section 2.3 seem reasonable in different scenarios, and also multiple combinations of roles for different peers make sense in different scenarios. So I am wondering if a little more generality in the protocol might accommodate all of them cleanly without the protocol dictating specific choices about who sends what where. For example, messages could include indications of whether notification is required if the tx is invalid, whether TRANSACTION-VALID/TRANSACTION-INVALID statements should be sent to the endorser or directly to a committing peer, evidence of endorsements gathered so far, etc.
  12. I like the generality implied by the consensus fabric just getting "blobs" and ordering them. For example, at first I thought this would allow Hyperledger fabric to support variations on the theme exemplified by BigchainDB, where consensus determines order of blocks, and block validity is determined later by peer voting. But Section 2.4, is more specific than this, seemingly saying that blob=transaction. (More about this below now that I've read Section 4.1.) Similarly, Section 2.5 refers to consensus fabric delivering transactions, which again seems as if it may be overly specific. Maybe I am overgeneralizing the intent of the overall project and the "blob" language is just to emphasize that the contents of the transactions are opaque for consenters. It's an interesting question to what extent the project should attempt to support a broader range of possible implementations. I'm not sure to what extent that kind of thing has been discussed/agreed, but there is clearly a spirit of supporting various pluggable things, so I wonder if that could/should also extend to having sufficient generality to allow, for example, submitting peers to construct blocks of transactions, and committing peers to unpack them and process them. If a decision is made to preclude this possibility, it may warrant more discussion, motivation, validation, etc. For example, to what extent would such a decision impact communication overhead?
  13. This sentence in Section 2.5 is confusing: "For example, once can provide serializability when for all keys in the readset and writeset, the current version in the state is still the same as the one in the readset, and all other transactions are rejected."

    • s/once/one/ ?
    • As far as I can see, there is no assumption that the writeset is a subset of the readset, so it doesn't make sense to require a key in the write set to have a version equal to "the one in the readset"; I think the intention is that, for a key in the writeset, the version in the state equals the one in the writeset.
    • Also the phrasing is funny, such that it seems on first reading like "all other transactions are rejected" is part of the condition described in the sentence, which doesn't make sense.

    How about: "For example, serializability can be provided by requiring the version associated with each key in the read set or writeset to be equal to that key's version in the state, and rejecting transactions that do not satisfy this requirement." ?

  14. The discussion of different isolation levels makes me uneasy. A few observations:
    • The idea of converting transactions into readsets and writesets early on (before endorsements are gathered, for example) and then rejecting a transaction if it does not still apply against the same versions witnessed earlier creates a window in which a transaction may be rejected due to concurrent updates when this is not strictly necessary. For a simple example, imagine transactions that increment a counter stored against some key. In systems such as the current Hyperledger fabric, Ethereum and others, where the chaincode/contract is executed when the transaction is validated, two transactions that increment the counter and are submitted around the same time will be considered in order and can both succeed. In contrast, with the proposed arrangement, the two transactions may see the same state and by the time they are processed, one of them must fail, even though there is no reason from the point of view of the chaincode's semantics why it must fail.
    • "Optimizations" such as considering weaker guarantees (e.g., snapshot isolation) may mitigate this to some degree, but also introduce cognitive overhead for a brand new class of programmers in that they have to be aware of the possible semantic implications, and if they are not acceptable, work around them explicitly in a way that cancels any performance improvement that might have been offered by implementing a weaker guarantee. Most likely, people will often overlook this issue, maintain a mental model of serializability, and it will be fine most of the time. Right up until it isn't.
    • There seems to be no reason for a decision on one guarantee. For example, different smart contracts could provide different guarantees, if it really made sense to offer different ones. (Like the proposal, I am not considering cross-chaincode transactions here, but I think at least common combinations would work and provide the weaker guarantee.) Overall, if a decision is made to support a weaker guarantee, this decision should be supported by concrete evidence of a tangible benefit, and explicit acknowledgement of the potential downsides and risks and plans to mitigate them (e.g., via documentation and education). Especially if the decision is to support only a weaker guarantee.
  15. In Section 3.1, "the endorsement policy is a predicate on the transaction, endorsement, and potentially further state" sounds slightly circular to me at first. It might be clearer if "endorsement" were replaced with "endorsement(s)" or "endorsement message(s)". I think the idea is that we could have a policy that states, for example, if any 2 of these 3 endorsers endorse the transaction, then it is considered endorsed. It is not clear to me where these individual endorsements would fall in the list of data to which the policy may refer. Presumably not in the "and potentially more" bullet for the foundations of such a simple policy. Yes, it would involve their keys/identities but that doesn't refer to the fact that they have endorsed this specific transaction. Presumably it is intended that the endorsement policy would depend on the messages sent by endorsers, as described in Section 2.3.
  16. The last bullet in Section 2.5 says "It is important to note that invalid transactions are not committed, do not change the state, and are not recorded.", which appears to contradict the first bullet in Section 4, which says "The raw ledger contains information about both valid and invalid transactions and provides a verifiable history of all successful and unsuccessful state changes and attempts to change state."
  17. The second bullet in Section 4 seems incomplete: in addition to filtering out invalid transactions, the effects of valid ones are applied to the state.
  18. What does it mean for the raw ledger to be "mandatory"? If I don't store it, how can you tell and why do you care? Do I really have to store all invalid transactions (which includes transactions that are perfectly fine except that they became invalid because someone concurrently updated some state that they access)?
  19. What is the added value of the validated ledger if the raw ledger is "mandatory"?
  20. In Section 4.1, we learn that the consensus fabric can form blobs. When I mentioned the BigchainDB approach above, I was imagining that submitters would form blocks of transactions, and submit these as blobs to the consensus fabric, which would simply produce an authenticated ordering of blobs. That seems cleaner to me, keeping the consensus fabric focused on its core business, and not getting involved in policies of what size blobs to create, how to represent them, etc. What's the reasoning behind putting this into the consensus fabric instead?
  21. This seems contradictory: "Consensus batching does not impact the construction of the raw ledger, which remains a hash chain of blobs. But with batching, the raw ledger becomes a hash chain of batches rather than hash chain of individual blobs."
  22. Section 5 refers to "state" but is really talking about ledger data, not the "(Blockchain) state" referred to by the second bullet of Section 4. This could be confusing. And I think there is a deeper issue here than just presentation, which I will explain below, after a few other comments.
  23. While the checkpointing approach allows pruning of raw ledger data so that invalid transactions do not need to be stored indefinitely, the first bullet in Section 5.2.iii has a fallback case that depends on other peers having retained the raw blocks unless a trusted checkpoint is available. So we could not guarantee that both: i) all invalid transactions can eventually be pruned; and ii) a new peer can always start without depending on finding a trusted checkpoint. Maybe that's OK, but these issues really make me question the motivation for storing the invalid transactions in such a way that they affect the hashes required to verify valid transactions.
  24. If we need to retain invalid transactions, why not record them separately? For example, directly storing the validated ledger, and including in each block's header the Merkle root of a structure storing the invalid transactions, would allow the invalid transactions to be discarded at will without interfering with the ability of peers to catch up with the validated ledger. If the invalid transaction sets for each block also contained indications of the order in which invalid txs were processed wrt the valid ones, this would allow reconstruction of the "raw ledger" by anyone who chose to retain the invalid txs. Block headers of the valid ledger could also include hashes of associated raw blocks, allowing verification of the reconstructed raw ledger. (I am not suggesting that all this is needed, just pointing out that it's possible to reconstruct and verify the same information recorded in the raw ledger in the current proposal.)
  25. This approach separates the issues of retention of invalid transactions and recovery from checkpoints (or replaying from the beginning in case no trusted checkpoint can be found). However, this still does not address the deeper issue I alluded to above.
  26. While the proposed checkpoint approach allows a catching-up peer to skip processing invalid transactions for the missed blocks, it still requires it to process all valid transactions. For a new peer, this means requesting and processing all valid transactions since forever, perhaps years worth of transactions, just to be able to start operating. Some blockchains, like Ethereum for example, include a Merkle root of the state tree in block headers (and store block's transactions separately from the headers). This enables peers to provide Merkle proofs of state reported to light clients, and, more directly relevant to this proposal, also enables "fast sync", whereby a new node can request just headers from peers and then receive a representation of the recent blockchain state from (a) peer(s), which it can validate against the state Merkle root from the associated block header. In the case of Ethereum, this allows a new node to be started in hours, rather than quite a few days, and this is after less than one year of operation. If the validated ledger included such state Merkle roots, then checkpoints would similarly contain sufficient information to allow a peer to request the state (as in the KV store) from (a) peers(s) and verify that it is the correct state for the checkpointed block, thus avoiding the need to receive and process all valid transactions since forever. If a trusted checkpoint cannot be found by a catching-up or new peer, then it can still request all the valid transactions and process them all, regardless of whether everyone else has retained invalid transactions. Is there some compelling motivation or advantage associated with the proposed approach that I am missing?

Hope some of this is helpful.

vukolic commented 8 years ago

@mm-hl Thanks! I am again going to focus in the response only on vital things:   3) to be fixed - probably by making endorsing peers also committing peers   4) ack - to be fixed   5 + 14) Let me tackle this one early on: The approach to handling concurrency is a critical one. Version dependencies were chosen after weeks of (sometimes heated :) discussion with @chetmurthy and @cca88. Although this one probably deserves a separate issue - let me elaborate a bit here.   In principle, we have two approaches to handling concurrency at hand.   a) version dependencies (verDep): these are typically used in the context of databases where a concurrent change to an object is conditional to object not changing in the meantime. This provides liveness-level typically called in distributed computing - obstruction freedom. In DB parlance we need to specify isolation guarantees, which as you correctly point out - may be specified per chaincode and not globally.   b) sequencing through a leader (leader): in this approach - taken by many distributed systems, there would be a leader endorser that sequences concurrent requests and resolves concurrency. This approach is taken by the Sieve protocol that is a current HL fabric prototype consensus protocol (http://arxiv.org/abs/1603.07351).     Both approaches have pros and cons. Again, this requires a separate issue but will try to put out the essence here.   "verDep" PROS: 

mm-hl commented 8 years ago

@vukolic thanks for the quick and detailed response. Minor responses in this comment, longer discussion in the next.

7) Doesn't seem to address my comment, not sure what you are saying.

8) All of the things mentioned (including PROPOSE and SUBMIT messages) contain signatures, according to my reading of the proposal. My comment is about what is covered by the signature and specifically about why the message type is covered in some cases and not in others. I can see that it's potentially important for the signature to cover the TRANSACTION-[IN]VALID, so nobody can lie about the determination, but I can't see that it's important not to cover PROPOSE and SUBMIT, and all else being equal consistency is probably preferable. No big deal, I was just curious if this was intentional.

9) My comment doesn't count as a vote to keep the roles of consenters separate from committers. It's a suggestion to keep the functionality separate, and (along with comment 11) to generalise the protocol to cleanly cover a wider range of configurations and not have to specify what gets sent to whom and in what order.

15) Yes I think that would be clearer.

20) Sorry, yes I meant "batch" where I said "blob" (first instance in comment 20).

21) Above typo notwithstanding, I stand by my comment that the sentence seems contradictory. (Minor nit, just trying to help improve presentation.)

22 + 26) Not sure what you mean. If it is "required", for whom does this create an obligation? At what granularity might one opt out if it were options? Per implementation? Per Network? Per peer?

23) I emphasize "in such a way that they affect the hashes required to verify valid transactions" in my comment. I agree that making the consensus fabric process opaque blobs means that it has no option but to include invalid transactions in its output. That does not prevent committers [edit: this previously mistakenly said "consenters"] (who can determine that they are invalid) from maintaining a ledger representation that keeps them out of the way, thus facilitating easy pruning and fast sync for new peers, which is what I attempted to illustrate in comments 24-26.

mm-hl commented 8 years ago

Responding to @vukolic's 5 + 14):

First, I find it difficult to tell from the proposal what the role/purpose of endorsers is intended to be, which I think affects this conversation significantly, because it's hard to figure out what motivates what. Here are some of the reasons I'm struggling with this and some "reading between the lines" that may help me understand and you clarify:

I note that the proposal does not require committing peers to be able to execute transactions. My comments were coming from a perspective where it would be a reasonable for them to be able to do so. Perhaps avoiding this is a key motivation for the proposed design? I can see potential advantages including:

If I reread your response with an assumption that avoiding committers executing transactions is a/the key motivation, it makes a lot more sense to me, so I think I am understanding the proposal and its intentions better now. If this is correct, the proposal would be much improved by upfront discussion of this motivation, probably in Section 1.3. I'll proceed with the assumption that I've got that right.

First, wouldn't it make more sense that the submitter would send the txpayload to endorsers, and the endorsers would execute the chaincode and construct the transaction? That way, the submitter would not need to be able to execute the transaction, thus supporting confidentiality. Presumably endorsement policies would (usually) require that all endorsers produce the same transaction; otherwise, what would make sense to do?

This raises a question: against what state do (submitters and) endorsers run the transaction? Even if chaincodes are deterministic and all participants are honest, they may end up with different state updates and verdeps if they run against different states. Maybe the submitter should nominate a state to run against, and endorsers could endorse only when/if they have caught up to that state?

The key advantage of obstruction-freedom is that it allows us to decouple synchronization from scheulding/contention control, while admitting much simpler mechanims that are required for lock-freedom and wait-freedom. But obstruction freedom without effective contention control is usually a disaster. I vaguely thought about this when considering the retryFlag. Is that its intended purpose? If so, have you thought much about what kinds of retry policies could be implemented and would make sense? Presumably endorsers would be in the best position to implement retry policies, and thinking about this in context of your comments about Eve starts to maybe make some sense. For example, endorsers could potentially work together (perhaps, but not necessarily, via a leader) to combine transactions from the same chaincode into orders or batches that might somehow be processed more efficiently by committers, perhaps somewhat analogous to the "mixer" in Eve. This might be mostly internal to endorsers, but the protocol might help by enabling endorsers to communicate ordering/batching dependencies to submitters (or directly to the consensus fabric).

Coming back to the potential motivation for reducing transactions to state updates and version dependencies pre-consensus, it may help to improve throughput for committers, but:

mm-hl commented 8 years ago

One more small comment on terminology. I find it slightly strange in Section 1.2, bullet 2.a that a committing peer "commits the transactions and maintains the state" but is called a "read-only peer" (if is not also playing other roles). I get that the intention is that it can only support read-only queries from clients if it does not play these other roles, but nonetheless the terminology seems slightly strange to me given that it is maintaining the state and applying updates to it.

vukolic commented 8 years ago

@ mm-hl quickly re 5+14)

The idea of HASH(txPayload) in alternative design in 2.2 is misplaced - HASH(txPayload) (or tid) should be sent to consensus service in 2.4 for confidentiality, so basically tran-proposal cannot be reused from 2.2 to 2.4. PROPOSE message (Sec 2.2) HAS TO have full txPayload sent to endorsers for - indeed - execution of the transaction. I will make this clear asap since this is critical.

[EDIT: this is now fixed by having tran-proposal containing HASH of the payload, while PROPOSE contains txPayload, explicitly and outside tran-proposal.]

As for your remarks on confidentiality - precisely - this is one of the main motivations of this approach. I am preparing an extension of the document that will deal with confidential chaincode examples in this architecture so things should become more clear then.

As for synchronization for obstruction freedom - the expectation here is that the data model will permit less synchronization (e.g., "spend-once" UTXO). If this turns out not to be the case - we will need synchronization (e.g., leader) in the vein of Sieve/Eve - perhaps only for state partitions that hold such objects.

vukolic commented 8 years ago

@ mm-hl re 9 + 23)

" That does not prevent consenters (who can determine that they are invalid) "

In fact consenters - in the current proposal - have no way to determine whether a tx is invalid or not. This can be only done by (committing) peers.

The design is currently general as in - it separates consenters from (committing) peers. The main motivation is to allow easily pluggable generic consensus service that do not have to care at all about chaincode state, verifying endorsement policies and the likes. Such a consensus service could inherently be as scalable and performant as possible as execution/validation is totally removed from the critical path.

The cost of this genericity and modularity is the fact that consenters (consensus service) cannot tell valid and invalid transactions apart. This needs to be done at the (committing) peer level. If this is a show-stopper, it is fairly easy to specialize this architecture to have consenter >= committing peer. This would come at the expense of sacrificing advantages listed above.

Besides 5+14 - this is another big design decision that we as HL community need to take.

corecode commented 8 years ago

Including stateUpdate and (more importantly) verDep in the PROPOSE message to endorsers allows endorsers to be (internally) sharded on the chaincode state keys.

cca88 commented 8 years ago

@mm-hl: Thanks for your comments and the discussion.

Re 13) Your text says what was meant, this is better.

Re 14) Yes, this would essentially be a transaction manager running together with consensus on blockchain.

Re 26) The proposal mentions this, it's the optional "(Blockchain) State" data structure that can be part of the hash chain, included through the root of a hash tree on the state, just as you suggest in Ethereum. Makes a lot of sense as an optimization, not strictly needed for security. We will need to decide whether to have this.

mm-hl commented 8 years ago

@vukolic Sorry, I caught myself a number of times saying "consenter" when I meant "committer" but obviously didn't catch them all. Yes, I totally understand and agree, and my sentence was about what consenters store, nothing to do with committers.

@corecode True, but at the expense of confidentiality from the submitter, since it has to execute the transaction in order to determine the stateupdate.

@cca88 The "(Blockchain) State" mentioned in the proposal is described simply as a KV store. I don't see anything that suggests it's stored as a Merkle structure, or that its Merkle root could/would be included in checkpoints, which is what is needed for supporting fast sync.

Also @cca88 I don't get your comment about 14). Are you agreeing that different consistency conditions for different chaincodes could make sense and commenting on how this would be implemented?

cca88 commented 8 years ago

@mm-hl: Right, this was confusing: I thought that optionally including the blockchain state also meant to include it inside 4.2 as part of the hash chain. (That hash chain now is built over the VL.) Will clarify this option.

14) Yes, agree. IMHO the design moves towards a transaction manager (TM) being implemented inside the logic for applying the updates; the TM ensurs a chosen consistency condition (isolation guarantee). The conditions couldn't be implemented by chaincode itself, only hard-coded ones in the fabric seem possible. Chaincode deployment can pick the isolation guarantee that it wants and which programmer understands. There is a line of literature that has investigated how to run a replicated DB without a central point of control (over a reliable group communication system, tolerating crashes only) by Kemme, Jimenez-Peris, Patino-Martinez, Alonso, Schiper and others. It is summarized in a textbook (Database Replication, Morgan & Claypool). When extending the approach here to full generality, it would become the equivalent of their "database replication" but in the BFT model.

tock-ibm commented 8 years ago

Issue: committing peers can change the order of blobs from consensus.

The question of Byzantine committing peers was already raised. However, while it is true that a Byzantine committing peer cannot alter the content of a blob provided by the deliver() event of the consensus, it seems to me that it can change the order between blobs. That is, the deliver provides {sequence-num, blob, hash}; anyone can compute a new hash for a different {sequence-num', blob} pair. For the ordering provided by the consensus to be resistant to forgery it must be signed by some entity, not just hashed.

vukolic commented 8 years ago

@tock-ibm

Byzantine committing peers can indeed do what they want with their raw ledger producing a hashchain that verifies but is bogus. However, this is not an issue, since such Byzantine peers cannot convince any honest peer of their chain. Namely, in the State transfer and checkpointing protocol (Sec 5) an honest peer gets the head (tip) of the raw ledger hashchain from the consensus service and then it resolves it back via peer-to-peer comm.

Now, whether consensus service delivers multi-signed batches to peers or it delivers via f+1 consenter confirmations - this is up to a consensus service implementation.

vukolic commented 8 years ago

A subset of comments above is included in the new document revision. Notice that this does not mean that other comments will not be taken into account.

Notable change is that an endorsing peer now maintains the state. As a result, there are no committing peers any more, but simply peers, which can have additional roles of submitters and endorsers.

mihaigh commented 8 years ago

@vukolic transaction flow should be updated after this peer unification

mm-hl commented 8 years ago

These comments are based on this version.

27) Regarding the changed definition of submitting peers (Section 1.3.2.a), this appears to address the confidentiality issue in previous versions, because submitting peers accept transactions pertaining to specific chaincodes, so submitting peers for a given chaincode can be a subset of endorsers for that chaincode. However, it also puts the burden on clients of knowing which submitting peers can accept transactions for which chaincodes. This means that clients would need to manage issues like figuring out an alternative submitting peer in case the one they have previously remembered is unavailable or misbehaving.

An alternative would be to keep the previous structure of allowing any submitting peer to accept transactions for any chaincode, but to remove the responsibility it previously had to execute the chaincode. Instead, it could identify a "lead" endorser, which would execute the transaction, produce the proposed read/write sets, and request additional endorsers to verify correct execution and endorse the transaction. The submitting peer would only need to wait for a sufficient set of endorsements to show up. (What it does if they don't is a separate issue; maybe it times out and picks another lead endorser.) The submitting peer could inform the client which lead endorser was successfully used, allowing the client to "cache" the result for optimizing subsequent transactions for that chaincode, but would not require such caching or require it to be up to date or consistent. I think this arrangement would better fit the description of "providing an interface between clients and the blockchain", as it removes unnecessary burden from clients. (Also, it would make the "stateless" and "any peer" advertising in Section 1.3.i. more accurate.)

28) Section 1.3.ii seems repetitive coming so soon after the introductory material in Section 1.3. Furthermore, it isn't quite consistent with the earlier material. For example, the "pertaining to a particular chaincode" part of submitting peers that I addressed above is not reflected here.

29) In Section 1.3.iii, I think the specificity of "for transactions and state updates" is potentially misleading and confusing, given that the consenters themselves know nothing about the opaque "blobs" they are ordering.

30) In the second paragraph of the same section, "reliability" is mentioned, but what is described thereafter is really only about atomic broadcast. There is no stated or implied guarantee that all messages offered will eventually be delivered, or anything else that I would consider as "reliability" guarantees. Maybe just say "different implementations may offer different reliability guarantees" or something like that? (Now I see that this is addressed better in later text, so maybe just add "implementation-specific" here.)

31) Also, given that you're going to the trouble to accurately describe what the "consensus fabric" does (atomic broadcast), why not take the opportunity to excise the common abuse of the word "consensus", maybe putting a footnote to avoid confusion for people who are accustomed to (mis)using "consensus"? Maybe even go so far as to rename consenters and consensus fabric, for example to broadcasters and broadcast fabric? But maybe the terminological abuse is widespread enough that there is so no point resisting it.

32) I think the remark in Section 1.3.iii is too vague and speculative to be more than a distraction.

33) I think the Safety guarantee paragraphs would be improved if more precise properties were stated first, and then intuitive summaries and observations included later if they are still needed, rather than starting with imprecise descriptions and then refining them with "this means", "note that", "put differently", "in other words", etc. How about this (to encompass both safety guarantees):

There exists some sequence M consisting of messages m_i = (seqno_i, prevhash_i, blob_i), i=0,1,...:

What this doesn't address:

36) Regarding checkpoint validity policies, while it is alluded to in an example in a sub-bullet, I think it's worth pointing out explicitly that a weaker checkpoint validity policy (or combination of local checkpoint validity policies) can undermine fault tolerance properties because it may cause a correct peer to behave "incorrectly".

37) In Section 5.2.i, I suggest s/blocknohash/blockhash/g as the hash is for the block, not the block number, right?

38) I still don't find the checkpoint and state transfer description very convincing or compelling; the previous comments still mostly apply.

vukolic commented 8 years ago

@mihaigh - fixed @mm-hl

27+28) The intention was to have submitting peer able to act on any transaction, except those pertaining to confidential chaincodes (that section will appear soon). The text now consistently reflects this.

As for the leader, I would like to treat this separately. After some discussions - and following your concerns as well as following concurrency and programming model impact of version dependencies raised by the community - the idea is as follows: offer both the option of leaderless verDeps (MVCC) as the text now stipulates + leader-based approach. Leader would be elected per chaincode, with the assistance of the consensus service.

While chaincode could implement its own leader election module on top of consensus (total order broadcast) - this may be sub optimal and very complicated to the chaincode developers, so the idea is to have the fabric support for few typical leader implementations.

Notice that chaincode could opt for leaderless variant (default) or some leader-election policy built in the fabric, or simply implement its own leader election.

Notice also that in such an approach, current leaderless MVCC is just a special case of leader function implementation, in which leader election chaincode deployed at every submitting peer would be

leader(chaincodeID) return myID

For actual leader implementation that leader implementation would look at the blockchain state to determine the current leader and act appropriately (per leader election policy/chaincode) when necessary to help elect the leader.

I will work out the first draft along these lines and submit - so we can discuss further.

29-32) fixed

33-35) Agree, this is TBD properly. Notice that 34 and 35 are addressed in prose already.

36-38) Changes are pending to the checkpointing mechanism to include the state hash. Will post when ready for review.

mm-hl commented 8 years ago

@vukolic thanks for the responses.

Regarding 27+28, please note that I was not suggesting a different or additional approach to concurrency, which I agree is a separate topic. My point was only regarding whether a submitting peer could accept a transaction for a confidential chaincode for which it does not have the ability to execute the transaction. The key part of the suggestion is: identify a "lead" endorser, which would execute the transaction, produce the proposed read/write sets, and request additional endorsers to verify correct execution and endorse the transaction. All of that assumes the MVCC model, and just addresses who has the ability to execute and prepare the MVCC-style transaction.

Although I used the word "lead" in "lead endorser", the suggestion does not entail or require any form of agreement or election, and thus none of the complexity you discussed in your response. The "lead endorser" could instead be called the "initiating endorser" and could be chosen at random or via any other policy. A different choice could be made per transaction. Indeed, a submitting peer could even choose to send the "initiation request" to multiple endorsers, again chosen randomly or otherwise, and then wait for a sufficient set of endorsement responses to satisfy the endorsement policy, which might or might not all derive from the same "initiating endorser".

The main change I am suggesting is regarding step 2.2 in the transaction flow: The submitting peer prepares a transaction and sends it to endorsers for obtaining an endorsement. This is what puts the burden on submitting clients of knowing which submitting peers to target for which chaincodes. It sounds like you're preparing more detailed treatment wrt confidential chaincodes, so I will try to review again when that appears.

Thanks for the good work!

vukolic commented 8 years ago

@mm-hl

ack - the incubating confidentiality draft is precisely doing sth similar - that one will appear for review/comments/discussion soon (will of course post a notice here).

This does not change the story re leader in the context of concurrency as discussed in my previous comment. We agree this is a separate issue and will be treated in the separate section of the proposal - so we can discuss it then in more details.

manish-sethi commented 8 years ago

I wanted to discuss here on the aspect that causes most departure from the current architecture - which is transforming transaction from the original form (issued by a user - invoke a chaincode method) into a form that includes state updates. Because, this is what requires new functions such as endorsers and committers, new artifacts such as read-write set/verDep, and also changes the transaction execution flow significantly. At the first sight, to me it appears that the main motivation is the Confidentiality as the other advantages such as separating the consenter nodes are more about separating the functions performed by a single peer in the current architecture. So, is the main motivation to allow deployment of a chaincode to specific set of peers and still to maintain a single global ledger (transactions and state)? If yes, then why not instead have a separate ledger altogether for each such confidential chaincode (or trust group). The peer that does not have the chaincode anyway can not operate on the corresponding state and moreover it may be be confusing to allow the unrelated peer the access to state data but not the chaincode. Allowing non-determinism in the chaincode may be another side-advantage but I am not certain whether that could be the main motivation. Similarly, agreeing on state updates beforehand could be another but I guess checkpointing also serves the similar purpose.

So, basically, I wanted to find out the main motivation of including the state updates in the transaction definition which causes a significant change in flow.

vukolic commented 8 years ago

@manish-sethi

The high level motivation is to simplify fabric implementaiton. This motivation is broken down as follows (w/o specific order): 1) handling non-determinism, 2) allowing more parallelism in chaincode execution (endorsement), 3) providing a simple mechanism of ensuring that a transaction is never executed more than once.

1) Handling non-determinism is an extremely important motivation, esp. when we replicate code coded in high level language such as golang/Java/etc. With model w/o state updates (taken, e.g., by the Sieve protocol - http://arxiv.org/abs/1603.07351 - which is the prototype in the current HL fabric) there is always a case in which a non-deterministic would appear as deterministic during execution and, as such, committed to the ledger. However if one only logs the transaction payload of such a transaction and not its state (updates), a replica repeating the transaction execution sequence (e.g., a new peer) could end up executing the transaction payload and end up in a divergent state (due to non-determinism). This can be of course handled through state transfer instead of sequential execution - but it complicates the system. Hence the choice for state updates which yield a simpler (fabric implementation-wise) way to deal with non-determinism at the fabric level.

(as a side note: state updates are directly applicable to Sieve as well, and would turn Sieve into leader-based protocol for handling non-determinism with state updates)

2) As for parallel execution, state updates + version dependencies are nice, since they allow leader-free parallel execution. This is not the only way to implement parallel execution (which can be also leader based, cf. Eve - OSDI'12) but again, it appears as a simpler one.

3) Further motivation is to simplify the implementation with respect to transaction execution as with state updates (and version dependencies) one cannot "accidently" execute the same transaction twice (see ZAB paper on Zookeper atomic broadcast by Junqueira et al. DSN'11 for discussion on this one).

manish-sethi commented 8 years ago

@vukolic thanks for highlighting the though process behind the proposal. To discuss these further, can you have a look at the following.

1) For the case of a new peer joining, we can always include the state updates produced by the transaction payload into the block when we execute the transactions in the current architecture. In other words, a block in the blockchain can still look same as it appears in the endorsement based model. However, across live replicas we would rely upon the checkpointing which I think is the case even in the endorsement based model (please correct me if I am wrong here).

2) About parallelism, the proposed approach primarily tries to compensate the cost of grpc communication between a chaincode and peer (assuming these are not high CPU consuming transactions) at the cost of additional overheads which include signing and collecting the endorsements, performing the additional ledger reads (one during simulation and another during commit - for version matching), roll backs in the case of version conflicts. I am sure that these overheads may be compensated for large networks where endorsement is required from a significantly smaller subset of peers. However, for small-sized networks where a larger subset is involved in the endorsement, the final performance may be poorer because of these overheads. I am just concerned and wanted to know whether we have some measurements of these costs and some sense at what configurations (e.g., network size and ratio of endorsers to the network size) they start showing benefits.

2) (a) In past, I was thinking of a simple approach for allowing parallelism in the current code base. Let me write it briefly here and run this by you to see if this make sense. Below is a rough functioning of this Execute transactions in a block in parallel and maintain a map {k -> TxId} where a transaction TxId reads / modifies the key k. If there is no conflict (against each key only a single TxID entry is present) at the end of batch execution, commit the transaction results. In the case of higher number of conflicts, fall-back to sequential execution. Further, if there are a smaller number of conflicts some optimization can be applied e.g., rolling back and re-executing conflicting transactions in the order of relative order they appear in the block (there are corner cases here but I think - not critical to discuss right now). Let me know if you think that this could be made leader-free because of dynamically monitoring the conflicts and relying on checkpointing at a later stage? Upto a certain network size, some simple parallelism like this may result in better performance. But I am not sure if both the approaches can be merged in a single architecture and employ one vs the other based on deployments. As a side note, I believe that the above mentioned approach of dynamically monitoring is orthogonal and could be included in the endorsement based architecture as well where an endorser collects and executes the the endorsement requests in parallel.

3) I haven't read this paper but did not understand the comment of executing transaction twice by mistake. Is it just about adding a check of looking up txid in the blockchain before executing again or something more than that?

vukolic commented 8 years ago

@manish-sethi Many thanks for your discussion.

1) We can add state updates to the current architecture yes, and also to Sieve way of handling non-determinism. In fact the proposal here is a generalized Sieve, where state-updates are moved around instead of transaction payload (this is rather straightforward to add to Sieve) and we do not have a leader any more (this is more elaborate - hence the version numbers). BTW, I am not sure I understood what you are aiming at here 100%, so I might not be replying to the question.

2) you may be right of slight overhead (higher latency) when you think of a single chaincode. However, even with a single chaincode - separation of concerns allows more parallelism in execution, as you sending money to some account could be endorsed by you only and me sending money within the same chaincode could be endorsed by me only. So in this case, we have parallel execution whereas in the current architecture we would have a sequential one. So hence the speed up (in throughput) even for single chaincode. The way I see it, we have a much more pronounced problem today with HL fabric throughput than with its latency. Also, for a majority of blockchain applications, throughput will matter, so long as latency is reasonable.

Furthermore, if a single chaincode is doing things sequentially, there is not much help to its performance even with the current architecture. However, with the current architecture of HL fabric, such slow execution of a single chaincode would impact the performance of the entire blockchain incl. chaincodes that may be executing/endorsing much faster - because of global sequential execution. Hence, the partitioning of endorsers in this architecture is intuitively better for combining multiple chaincodes in a single ledger.

2a) What you describe is exactly the approach of Eve, OSDI'12 I was referring to. Regarding leader-less paralelism vs leader-based parallelism there is a tradeoff with respect to granularity of state that concurrent transactions touch. For fine-grained data models (such as UTXO) we can have a lot of paralelism with leaderless approach. For more coarse grained objects, we may want a leader - to, as you say - employ one vs the other in different chaincodes. In this context please look at my answer above to mm-hl (27+28). This "best of both worlds" of leader-less vs. leader-based approach is certainly to be added to this design document.

3) This is not the main motivation, but the approach in principle allows for a consensus service that would deliver a transaction more than once as in "at least once semantics" rather than "exactly once semantics" (although the current specification does not allow this). Namely, even if consensus service would have "at least once semantics" - because of version dependencies such a transaction would never be executed more than once (it is idempotent). ZAB paper has more discussion on this arguing that "at least once" is simpler to implement than "exactly once".

JoVerwimp commented 8 years ago

I have not been able to go through the entire discussion here, but wanted to share initial thoughts at this point.

initial remarks on the text, not content: 1- For a reader, it would be easier if the text was structured to first describe shortly the current architecture, its issues/limitations and then embark on describing the proposed architecture and how it addresses the issues/limitations. 2- some terms are used with various semantics attached to it, eg. blockchain: "consent on the blockchain order" (incorrect use?) vs. "deployment chaincodes whose state is part of the blockchain but..." (?) vs. "The blockchain is a distributed system consisting of many nodes that communicate with each other" vs. " The communicated messages are the candidate transactions for inclusion in the blockchain." vs. "a peer appends the transaction to the log (blockchain) and..."

on the content: 1- section 3.1 & 3.2. While looking at this from a use case requiring the trade content and even trading patterns to be only exposed to the trade stakeholders: how would the endorsement policy be able to support such a pattern? If the data read and produced by the transaction (e.g. the trade details) is only shared between the stakeholders (therefore stored off-chain) while the hash/signature is stored on the (raw) ledger. The hash/signature stored on the (raw) ledger should not allow the identity of the involved stakeholders to be revealed except tot eh stakeholders. If a endorsement policy defines the endorser set as the stakeholders as identified in the transaction itself, would this support the endorsers to remain anonymous within the system but establish the required trust (identity) to each other? Can the (certificates to create the ) signatures mentioned in section 3.2 be such that they support this? 2- section 2.3; "Alternative design: An endorsing peer may omit to inform the submitting peer about an invalid transaction altogether, without sending explicit TRANSACTION-INVALID notifications." Note this would require an algorithm to determine a dead node, define a time-out which also avoids flooding the peers with PROPOSE messages while guaranteeing system responsiveness. 3- section 2.4 paragraph 2; note this paragraph does not distinguish behaviour in case of error conditions. E.g. if the STALE_VERSION answer comes back it does make sense to let the submitting peer start from step 2.2 again. 4- section 4.1. It would clarify if it stated explicitly a batch can be configured to have a max. size (max. number of transactions in it) of 1 or more. Note 1 would basically disable batching.

manish-sethi commented 8 years ago

@vukolic thanks for your detailed reply.

1) I am just trying to weigh the pro and cons of the endorsement based approach to the other possible alternates. As you agree that a new peer can join the network in the same manner (i.e., by tranferring blocks and state updates) even in the current approach so, the meaningful difference is in handling non-determinism between checkpoints. In the current approach, you would execute transactions first and then agree upon and in the proposed approach you would agree first and then only call it a valid transaction. I agree with you on this point that the later is less complex to implement but not sure about the added complexity because of policies etc.

However, because it changes the transaction definition (since it now needs to include the versioning details) I am not sure whether this could have an undesired implication just want to discuss with the help of an example - could it open the possibilities of deliberate attempt to invalidate transactions so they never appear on blockchain? For example, if a transaction transfers some assets from A to B and it requires endorsement from both A and B. In the current architecture, the transaction appears on the block and if the execution results differ at A and B's nodes, it can only be because of either non-determinism of the chaincode or malicious behaviour; but in no circumstances the functioning of a correct fabric. In the endorsement model, A may never endorse intentionally and the transaction may never reach the consenters because of insufficient endorsements. Even if you allow the transactions to be included without sufficient endorsements (not for committing but just for recording), a meaningful validation is hard (because the rejection cause can always be cited as version dependency mismatch during transaction simulation which is functioning of the fabric).

2) I think that you got me wrong here. I was not discussing the parallelism that the endorsement model offers to the current codebase of sequential execution. Rather, again, I simply wanted to weigh the pro and cons of other possible alternatives for enabling parallelism. Sorry, if my text was confusing. It's good if you think that both the approaches (Eve based and endorsement based) have their place and you intend to have them in the architecture. However, more than corse-grain or fine-grain I had a different dimension in mind based on resource consumption. Let me explain that here. The execution of a transaction mainly involves executing chaincode (cpu + grpc communication cost) and disk access cost (dominated by random reads of keys by chaincode) Now, in Eve based execution, each transaction executes once at each node (Though it's a parallel execution but still each transaction's full execution happens at each node). In endorsement based approach, a part of the above mentioned cost (i.e., the disk access cost) is caused at each node and the other costs (cpu + grpc cost) happens at a subset of nodes (endorsers) but with additional overhead of signing, additional disk costs during transaction simulation and larger payload etc. And I was highlighting that the ratio of average size of endorsers-set to the network size would probably be a deriving factor for benefiting/loosing performance from endorsement based approach in comparison to the Eve based approach. So, when I referred to the additional cost, I was not referring to the added latency but I was referring to the fact that since in some settings this cost would hamper the throughput in comparison to the Eve-based parallelism. Finally, what are your thoughts on an orthogonal dimension related to parallelism where we may not maintain a global ledger at all and let all the components (blockchain/ledger/consensus) be there separately for separate trust groups (say for a each chaincode in a simple setting) and different trust groups run in parallelism. I am just wondering the value of the transactions and data of a chaincode lying on my ledger when I do not have the chaincode and I cannot operate on that data.

christo4ferris commented 8 years ago

I have to say that this is a rather clumsy approach to collaborative editing and discussion of a document. the mailing list or a googledoc would be better suited to this task. I wanted to make some editorial edits but thought better to send a marked up version in Word since the wiki isn't the best tool for this.

Some general comments: 1 - the document is rather inconsistent in its use of terms. 'consensus service' and 'consensus fabric' seem to be interchangeable. Pick one. 2 - this is really pub/sub not broadcast. I also found it awkward that we say that we can partition ala topics on pub/sub then proceed to assume there is but a single channel (broadcast). It seems to me that we should preserve the pub/sub notion throughout (note, I am a big proponent of pub/sub and have implemented a global scale pub/sub fabric/substrate for Sun a lifetime ago). I say this because the paper omits the whole subject of subscribing and substitutes "connects to the channel provided by the consensus fabric". In fact, we are connecting wth the consensus fabric and re-establishing our subscriptions and (re)identifying the topic(s) that we might publish if we want to be comprehensive in describing what is going on.

  1. given that this is really pub/sub we shouldn't call the operation broadcast but publish. We aren't broadcasting, we are publishing to a topic/channel. A true broadcast would go places the message wasn't necessarily wanted. I would assert that while a very simplistic implementation might send the message to every node, this doesn't scale.
  2. is the total ordering preserved across all topics/channels or just per topic/channel?
  3. does the prevHash apply to the previous message in a given topic or of all messages regardless of topic/channel?
  4. does the raw ledger apply to all messages received from the consensus fabric, or is it per topic/channel?
cca88 commented 8 years ago

@christo4ferris:

Editing - yes, but this is more like code, one can't simply rewrite a paragraph without considering the rest. And, couldn't you edit the source directly via git?

2 - this is really pub/sub not broadcast

Let me disagree, I am actually in favor of dropping the "pub/sub" terminology. Pub/sub is relevant when you can subscribe to multiple "topics" and usually doesn't care much about strict ordering or delivery guarantees. The design here talks only about "topic" = one blockchain. On the side, the text says this could be pub/sub with multiple channels, but actually offering that (your comments 4-6) will need a deep technical discussion on how to order transactions of different channels w.r.t. each other. We aren't there yet.

What matters most is "consensus" -- the promise that everyone receives the same transactions in the model where there is one blockchain only. That term has been picked up everywhere for this feature of blockchains. Technically, in the relevant literature that I cite from and have contributed to, the appropriate term for it is "atomic broadcast" or "total-order broadcast" because it implies agreement on an ever-growing sequence of messages with transactions. Calling this "consensus" is slightly problematic because "consensus" also means the single-instance primitive, the one where the system ever only agrees once; but this confusion is all over the literature, hence people should look deep enough to understand the difference between, say, "paxos" and "multi-paxos". Hence I can also live with calling it "consensus service".

I suggest we build this first with one "channel" as discussed here, and once there, move to a "multichannel" design and call it "pub/sub service" then.

christo4ferris commented 8 years ago

I can, but it felt awkward just making edits directly without discussion... a PR is a different beast but with a document, there is an editor and contributors and from my experience, googledocs has it about right for collaborative editing of a document allowing in-line comments and suggested edits.   Cheers,Christopher FerrisIBM Distinguished Engineer, CTO Open TechnologyIBM Cloud, Open Technologiesemail: chrisfer@us.ibm.comtwitter: @christo4ferrisblog: https://developer.ibm.com/opentech/author/chrisfer/phone: +1 508 667 0402     ----- Original message -----From: cca88 notifications@github.comTo: hyperledger/fabric fabric@noreply.github.comCc: Christopher B Ferris/Waltham/IBM@IBMUS, Mention mention@noreply.github.comSubject: Re: [hyperledger/fabric] Next Consensus Architecture discussion (#1631)Date: Sun, Jun 19, 2016 7:24 AM  @christo4ferris: Editing - yes, but this is more like code, one can't simply rewrite a paragraph without considering the rest. And, couldn't you edit the source directly via git? 2 - this is really pub/sub not broadcast Let me disagree, I am actually in favor of dropping the "pub/sub" terminology. Pub/sub is relevant when you can subscribe to multiple "topics" and usually doesn't care much about strict ordering or delivery guarantees. The design here talks only about "topic" = one blockchain. On the side, the text says this could be pub/sub with multiple channels, but actually offering that (your comments 4-6) will need a deep technical discussion on how to order transactions of different channels w.r.t. each other. We aren't there yet. What matters most is "consensus" -- the promise that everyone receives the same transactions in the model where there is one blockchain only. That term has been picked up everywhere for this feature of blockchains. Technically, in the relevant literature that I cite from and have contributed to, the appropriate term for it is "atomic broadcast" or "total-order broadcast" because it implies agreement on an ever-growing sequence of messages with transactions. Calling this "consensus" is slightly problematic because "consensus" also means the single-instance primitive, the one where the system ever only agrees once; but this confusion is all over the literature, hence people should look deep enough to understand the difference between, say, "paxos" and "multi-paxos". Hence I can also live with calling it "consensus service". I suggest we build this first with one "channel" as discussed here, and once there, move to a "multichannel" design and call it "pub/sub service" then. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

 

mm-hl commented 8 years ago

@vukolic I intended to say something in earlier discussion that is close to what I interpret @manish-sethi to be saying in part 2a, and I don't agree with you that it is exactly the approach of Eve, OSDI 12.

My interpretation of what @manish-sethi said is that transactions in a block should be executed "as if" they were executed in sequential order. In Eve, transactions are executed in parallel without such a requirement, allowing for the possibility that different peers get different results. Then, further mechanisms are required to cope with such divergence (the Verification stage), and additional measures are needed to make (excessive) divergence less likely (the Mixer).

If transactions in a block should be executed "as if" in order, then optimistic techniques, such as the one @manish-sethi sketches and others including transactional memory, can be used, and if they are not successful, then we will fall back to a less optimistic method (maybe eventually to fully pessimistic), but we will not complicate the rest of the protocol by allowing the divergence to escape from the local execution. For use cases in which conflicts are rare or nonexistent (perhaps due to successful "mixing" or just because of the nature of the use case), the overhead should be quite low and the parallelisation profitable. When conflicts are more common, it's not clear to me that allowing the divergence to propagate is a win.

To put this in context, let me repeat that I'm unconvinced by the value of and motivation for supporting nondeterministic chaincodes. Saying that it's useful if we write chaincodes in languages such as Go and Java isn't motivation for nondeterministic transactions, it's motivation for finding better ways of expressing chaincodes/smart contracts. The Eve approach introduces nondeterminism even if there is none to start with, thus creating a problem that then needs to be solved. Unless and until someone convinces me otherwise, I think chaincodes should be deterministic, preferably (eventually) enforced by the language or whatever is used to express them and/or formal methods, so I don't think we should prefer mechanisms that introduce nondeterminism, and we should not justify their need by an assumption that chaincodes will be nondeterministic either. If we didn't feel the need for better approaches to expressing chaincodes last week, we certainly should this week :smile:.

vukolic commented 8 years ago

@mm-hl @manish-sethi There are at least two things here: 1) @manish-sethi 2a vs Eve and 2) non-deterministic chaincodes. Let me tackle these separately.

1) I re-read @manish-sethi's 2a) and I see only that it is actually a special case of Eve - as parallel execution succeeds only "If there is no conflict (against each key only a single TxID entry is present)" - wheres Eve could make it in executing in parallel, sometimes, if mixer does a good job partitioning requests, even if more than one txId per key is present. Both approaches fall back to sequential if this is not the case. Notice that Eve's mixer can be optimized if it knows exactly which object a tx modifies - which is the information @manish-sethi's 2a) apparently has - in which case it could be made in a way that never makes mistakes - and in that case it would never produce divergent results - so the subsequent complexity could be avoided. Yet, Eve does not have this assumption - so it is more complex.

2) As for non-deterministic chaincodes. I fully agree that in the ideal world these must never come to fabric as non-deterministic. But we are simply not there yet, as HL (fabric) currently does not have a DSL that would dissalow non-determinism (I do agree, fully, that in HLP we do need such DSLs). Yet, until that time comes, fabric needs to ensure it protects against trivial DoS in which somebody deploys non-deterministic chaincode, issues a non-det tx and puts the peers in divergent state.

As a side note, cf. last week's events, it appears to me that the smart contract that caused those issues was in fact deterministic - but simply not well understood by its designers/developers. Namely, every time that "attacking" sub-contract would be executed on whatever peer - it would produce same results - so it is deterministic.

vukolic commented 8 years ago

@manish-sethi

Finally, what are your thoughts on an orthogonal dimension related to parallelism where we may not maintain a global ledger at all and let all the components (blockchain/ledger/consensus) be there separately for separate trust groups (say for a each chaincode in a simple setting) and different trust groups run in parallelism.

Sharding is certainly a technique we can apply to the fabric - and the plan is to eventually do so in HL fabric. The way I see things, doing simple partitioning is trivial - and we can easily do it. Yet, sooner or later, one would need to come up with some semantics/design to support cross-partition transactions (i.e., cross-subchain transactions) which is non-trivial. W/o cross-partition txs, we can very well have paralelism you mention, in both current HL fabric architecture and the next proposed one (i.e., this one)

vukolic commented 8 years ago

@JoVerwimp Thanks for your comments (apologies for higher latency)

non-content comments

  1. ack - good idea will add a subsection
  2. agree - let's try to address this iteratively...

content:

  1. (attn @elli-androulaki) As the design currently stands endorsers may be/are revealed through endorsement policy. I see no simple way to get around this, as peers must know whether the transaction is valid so they need to be able to establish that a tx is valid which requires some computation - here - this is done by verifying endorsers' signatures. How big an issue do you see this is?
  2. agree - this alternative design design will probably not applied. Pls notice here that a ("somewhat"-Byz) peer may omit to send TRANSACTION-INVALID and we could not really reliably detect this.
  3. ack - will address
  4. ack - will add
JoVerwimp commented 8 years ago

Thanks @vukolic (attn @elli-androulaki) on content point 1, not exposing the stakeholder's identity, is there a possibility of using (something similar to) transaction certificates for producing the endorsement signature?

The endorser would get a 'transaction certificate' from the CA, which later allows for validation of the signature without exposing the identity.

vukolic commented 8 years ago

minor revision posted addressing some terminology consistency comments from @christo4ferris and @JoVerwimp, comments 6, 13 and 14 from @mm-hl and few other minor changes.

mm-hl commented 8 years ago

@vukolic regarding @manish-sethi's 2a), you point out that both it and Eve "fall back to sequential" in case there are conflicts. Right, but these happen at quite different levels (according to my interpretation of @manish-sethi's comments, which admittedly may be colored by my own thoughts in similar directions). With Eve, if the conflicts results, then a higher-level protocol tries to choose one of them and if this fails, falls back to sequential execution. In contrast, in the "2a" idea, the "falling back to sequential" happens within each peer that encounters the conflicts, so the end result is that all honest peers determine the same result (some may have succeeded with concurrency optimizations while others may have fallen back to sequential). Thus, there is no allowed divergence, thus no introducing nondeterminism that wasn't already there and a simpler protocol because all honest peers get the same result for each block, even if executed in parallel.

Regarding smart contract languages, nondeterminism, etc., seems like we're on the same "ideal" page. I did not mean to suggest that nondeterminism was directly to blame (and agree it wasn't) but rather to point out that the recent events show very clearly that we need better language support for smart contracts, so it bothers me a bit to see design directions seemingly being pursued that bake in some of what I view as harmful aspects of the current pragmatic choices that should preferably be eliminated in time.

manish-sethi commented 8 years ago

@mm-hl yes, your interpretation is correct of what I had in mind while writing "2a" (i.e., the execution at each node is independent) - Though, I am not sure about - execution on one node not observing any conflict while at some other node, conflicts were observed - (assuming all transactions processed on committed state till last block commit). In fact, they would observe same conflicts because - A chaincode would read/write same data if executed against same state (assuming deterministic code). However, would like to elaborate further a bit on falling back to the sequential execution. If there are a smaller number of conflicts, the approach was to roll back and re-execute only conflicting transactions in the order of relative order they appear in the block. (This requires monitoring for a fresh conflicts with earlier non-conflicting transactions - hopefully a rare situation but still required for correctness). So, falling back to sequential execution of whole block was a worst case choice based on thresholds - 1) number of conflicts in the first round 2) number of iterations that produce a fresh conflicts when re-executing only conflicting transactions.

roger505 commented 8 years ago

Section2.5, 2nd paragraph: Reading the description I'm wondering if proposed method covers all possible isolation levels.

If I consider a scenario where two transactions (A , B) are entered and part of the chaincode business logic is to see if the other transaction exists and if it is to mark each as matched. The chain code is implemented by storing each of A and B as a key value pair with keys A and B.

When item A executes then it searches for B but does not find it. If item B executes then it searches for A.

If the consensus model only consider the read variable and the updated variables then if A and B each executed on versions of the world state that don’t contain the applied state of the other transaction, and then the state updates are combined in the same block, it is possible for both A and B will be entered into the system in an unmatched state.

The problem is clearly the “search” for items that don’t yet exist / are not committed.

One solution would be to partially serialise execution. So in the example above then it might be possible to insist that these transactions are executed serially on the same endorsing peer, however this would need endorsing peers to execute subsequent transactions against the proposed world state rather than against the committed world state.

It this the proposal in 2.3?

Another solution would be for the application to store A and B under the same key, say AB, however this imposes restrictions on the programing data model and moves to a model of few keys with large blobs of data containing internal structure, which may itself introduce performance problems.

A final solution might be to record search criteria along with keys read or written and then in the consensus model to see if any keys have been entered into the range search between execution and commit.

vukolic commented 8 years ago

@roger505 I am not sure I understand the issue

your "chaincode" seems to have following two tx A) A=1; if B==\bot then executeSomeWinnerCode B) B=1; if A==\bot then executeSomeWinnerCode i.e., if they execute concurrently on the initial state we have A.readset={(B,\bot)}, A.writeset={(A,\bot)}, A.stateUpdate={(A,1)} B.readset={(A,\bot)}, B.writeset={(B,\bot)}, B.stateUpdate={(B,1)}

as consensus orders one transaction before the other, the first transaction (per raw ledger order) is valid and committed (regardless of isolation level). Notice that there is an order among tx in a batch, even if they belong to the same batch.

In case of serializability the second (per raw ledger order) tx is invalid. Under some other isolation level (e.g., SI) the second tx could be valid as well.

let me know if I am missing sth

mm-hl commented 8 years ago

@manish-sethi yes, there could be many strategies for optimizing execution of a batch of transactions. The only requirement would be that they are executed "as if" sequentially, which is what's required to avoid the complexity (e.g., in Eve) that arises from multiple honest peers getting different results for the same batch.

roger505 commented 8 years ago

@vukolic I agree that the example you quote will be safe, however in the use case I was thinking of the matching is done with a range search rather than an explicit read as there is a "matching tolerance". How would the code represent a "RangeQueryState" where no key is found ? I agree that in some isolation levels this would not be protected against, I was just keen understand if in this design it is possible to configure the system so it is safe , even if that is at the expense of some limitations such as forcing potential problem transactions to have the same endorsers.

learner4 commented 8 years ago

My thoughts - would it be good to consider that Identities involved in transaction be verified during this process? If consensus and endorsement is done on transactions, the transactors involved should also be verified in a regulated environment. Endorsing the "content" of the transaction may be only a part of the solution where endorsers would also need to validate the Identities; just as an easy example: OFAC checks

roger505 commented 8 years ago

Is it possible to explain the confidentiality model further please? I notice that the tran-proposal contains the spID and the clientID, and that the tran-proposal is included in a blob, and the blob is included in the chain.

Would the spID and clientID not reveal something about the submitter of the transaction?

How does this fit with transaction certificates to obscure the origin of the transaction?

vukolic commented 8 years ago

@roger505 more details on the confidentiality are following... Pls stay tuned.

UPDATE: In the meantime clientID is gone as actually not necessary. In principle spID should not reveal anything with respect to the client nor leak any useful info. It is however useful, as it denotes a peer who computed transaction results and may reveal useful for accountability (e.g., peer submitting transactions with invalid endorsements) as well as to help uniquely identify a transaction.