Next Consensus Architecture discussion

vukolic commented 8 years ago

Use the comment field to discuss the next consensus architecture proposal.

roger505 commented 8 years ago

@vukolic Sorry to jumped the gun.

In your confidentiality update, can you please consider if one possible deployment could be a one to one mapping between clients and submitting peer, in which case would the spID not reveal the clientID. Or do you regard the only supported configuration as being submitting peers having a one to many relationship with clients ?

srderson commented 8 years ago

All - Please be aware that there is a recommendation that proposals should be moved from the wiki to the repository.

PR: https://github.com/hyperledger/fabric/pull/2082 Mailing list discussion: http://lists.hyperledger.org/pipermail/hyperledger-fabric/2016-June/000127.html

As this is one of the most active proposals, I figured people here may have opinions on the subject. Please weigh in on the mailing list if you have a preference one way or the other.

JoVerwimp commented 8 years ago

@roger505, @vukolic "Or do you regard the only supported configuration as being submitting peers having a one to many relationship with clients ?" so would that not move to a centralised model? i.e. where the submitting peers are centralised and servicing multiple clients. Since the submitting peer needs to prepare the transaction, it needs access to all data (possibly including off chain confidential data, including coordinating the updates in off-chain and on-chain data stores).

manish-sethi commented 8 years ago

@srderson, @vukolic, @corecode: moving the version related discussion here. From 2.2 "All k/v entries are versioned, that is, every entry contains ordered version information, which is incremented every time when the value stored under a key is updated” An alternate to the version is to use the ID of the previous TX that modified the key most recently. Benefits of using previous transaction-id instead of a number based version are:

This does not require introducing a new concept (auto-incrementing version) at the protocol level. It would conceptually be closer to bitcoin data model where previous-transaction-id is similar to input and newValue is similar to UTXO for the transaction.
Alone version info may not be sufficient in the case of a consensus protocol that allows forks. For example - when using PoW with endorsement based approach, think of a scenario, where a transaction was endorsed on fork A by all the endorsers and committed by the committers on fork B. Accidentally, if both the fork A and fork B have the same version for a key, the transaction will get committed to fork B with unintended state.
At implementation level, the explicit mention of previous transaction points to a structured record at the storage level - an index on transaction-id could be reused for tracking the change history for a key

However, the downside of using the transaction-id is that it would be significant bigger in size as compared to using a number for the version, leading to increased payload. If we finally chose the number based versioning, for the scenario highlighted in the second point above, the endorsement also need to carry the hash of the top block against with the transaction gets simulated (does it call out for a consensus specific info in the endorsement in general?)

Thoughts?

manish-sethi commented 8 years ago

@vukolic: while moving the transactions from raw ledger to validated ledger, in addition to dropping the invalid transactions (because of versions conflicts), I think that we should drop the read set info from the valid transactions as well (to reduce the storage requirement). Because, read-set info is mainly useful only for validation. Do you see any problem in this?

manish-sethi commented 8 years ago

@vukolic - from section 1.2 "The state of the blockchain ("world state") has a simple structure and consists of a key/value store (KVS)" I think that we should not restrict in the description to key/value store. Different implementation of ledger could support different data-models. At the protocol description level, we can be more general (e.g., a transaction simulation produces the info that is necessary for validation and making state changes). KV-store can be one simple data-model we can describe for illustration though.

My though process is that, if some one wants to manage state in a more complex data-model such as full-fledged relational database, though specifying stateUpdate in key-value form should be possible because of key-value being most basic in nature [e.g., one can represent tablename.rowId.columnName combination for the key] however, one may like to specify the state info in more compact form (say in a nested proto message in order to avoid repeating the table name if many rows in a table are updated by a transaction).

Essentially, I just prefer to leave the scope of representation of stateUpdate open ended and let the ledger implementation decide this. In fact, my earlier thoughts on using previous-transaction-id instead of auto-incrementing versions may also fall in this category. What are your thoughts on this?

ijmitch commented 8 years ago

I've tried to review the comments here - but there's quite a lot(!) so forgive me any duplication I'm about to create.

Would it be useful to settle on one of the terms validation or endorsement? Certainly a brief but specific definition of what either/both of these mean would help me. My working definition is something like agreement with another peer that execution of the chaincode against the current state has resulted in the same modifications to the state. I guess we're in a transition phase from one term to the other - it is explicitly stated "validating (i.e. endorsing)".
Is there any meaningful distinction between what is execution of a transaction in a submitting peer (first para of 2.2) and what is the simulation of the transaction during endorsement (2.3 first bullet)? (Other than one happens in a submitting peer and the other during endorsement.)
There seems to be a lot of implicit reliance on the disjointedness of state modifiable in a given transaction (ie the keys associated with the targeted chaincode). Could this be made more explicit?
For example, I presume that an endorser would report STALE_VERSION if it got a deliver from the consensus service with updates that have yet to make it to the submitter - so the assumption is a low rate of clashing transactions (deliver being the result of a recent transaction for the same chaincode from this submitter or another one).
I interpret this separation between endorsement and consensus being a separation of the integrity of the intended actions of the chaincode (the 'app') vs the replication of valid updates to database instances (of a ledger data structure). Taking this as a beneficial change.
The separation is so clean that I can't tell anything about how a consensus service would actually work - if you want me to think of it as a black-box then it's met that objective. If you want me to understand the system end-to-end, then that's not so good.
3.2 is confusing me... second para is mixing up peers and consenters in a way I don't understand.
how different is a single blockchain with a partitioned consensus service differ from separate blockchains (except for the obvious sharing or not of the database instance)?

I'll post comments about state transfer and checkpointing separately.

JoVerwimp commented 8 years ago

@manish-sethi Managing the k/v versioning away from a mere integer does make sense for the arguments you mention. The transaction-id may indeed be longer but I do not believe that to be the problem currently (storage is cheap, right?). Also I see no issue of stripping the transaction in the validated ledger of all unnecessary info, including its read set which is a subset of the write sets of previous transactions. Your comment on 1.2 makes sense to me, although for now a simple KVS simplifies documentation & implementation.

@ijmitch I agree with your statement that it would be best to make the assumption explicit that the application should be such that conflicts in write sets and subsequent read sets are naturally rare. Applications that have constant updates to a limited set of data do not fit this paradigm. From what I have read so far, endorsement is executing (simulating the execution of if you like) chaincode and checking (validating?) that with he same input, the same data is read from the world state & the same data is written to the world state is produced. Consensus is about agreeing an order of transactions in a block and detecting conflicts in their (nor ordered) write sets later read sets. I do like the separation of concerns and the flexibility its buys you for deploying the consenters without having access to any storage (they are not peers; they become stateless). On the comment around single blockchain with partitioned consensus vs. separate blockchains: a single blockchain would always apply the blocks (and successful transactions therein) to the entire blockchain, even if the consensus was reached with contribution from a subset of the consenters (depending on the selected consensus algorithm). Separate blockchains would (currently?) not be interacting (although with this proposal you could imaging they can use the same contenting nodes without many updates...)

manish-sethi commented 8 years ago

@JoVerwimp - about 1.2, I have no issues in the details highlighted using a KV-data model. However, what I prefer is an explicit agreement on the supporting any ledger (data-model) implementation and reflecting that in the write-up - may be just by adding a couple of lines that highlight the intent. In the current writing, it is not very difficult for someone to assume that KV-store is the only data model supported - which may lead to confusion (or wrong assumptions) between folks implementing consensus protocols and implementing a ledger for a data-model.

ijmitch commented 8 years ago

Since it's been mentioned, I would very much support moving proposals such as this from a scheme of using a wiki page(s) and a single issue for discussion to a set of docs under change control which we can fork, have more focused issues on, and submit pull requests for suggested updates to.

vukolic commented 8 years ago

@manish-sethi 1) version numbers: You already know my view on version numbers and they are summarized in your excellent comment. In a nutshell, the question boils down to the following: are we willing to pay a price for future-proof support of POW-style forked raw ledgers by accepting the overhead in the size of the raw ledger (more specifically, version numbers). Note that this overhead is not needed if raw ledger is a single, no-forks, chain. I prefer letting other community members voice their opinion here.

2) "while moving the transactions from raw ledger to validated ledger, " This is true, yet the same argument applies to the writeset, not only readset (this is less obvious since our implementation, both v0.5 and "v2", is having writeset and stateUpdates combined). For example, if version numbers are monotonically incresing integers, keeping writeset in the validated ledger also does not contribute to much, as one could obtain them by simply counting - in validated ledger history how many times a given key changed. In this sense, readset actually contains more information over stateUpdate than writeset. In the end, the "right" answer will depend on how much "history-tracking" we want to facilitate in validated ledger.

3) re comment on 1.2 - I fully agree, yet the intention was to have 1.2 as a spec that is (trivially) implementable with key-value stores but also with RDBMSs and sth third. It was also added in response to a previous comment that was arguing against the "vague" description of state. I propose to explain this point in English in Section 1.2. and to suggest that implementing this state with sth else than actual KVS is perfectly possible and perhaps remove currently unused KVS-like API calls (list/delete).

vukolic commented 8 years ago

@ijmitch

1) Would it be useful to settle on one of the terms validation or endorsement?... I guess we're in a transition phase from one term to the other - it is explicitly stated "validating (i.e. endorsing).

Precisely so (transition period). Will try to state that once and then use "endorsement".

2) Is there any meaningful distinction between what is execution of a transaction in a submitting peer (first para of 2.2) and what is the simulation of the transaction during endorsement (2.3 first bullet)? (Other than one happens in a submitting peer and the other during endorsement.)

In principle there is no difference (same chaincode is executed, with same transaction payload), except that the result of the execution needs not be the same, for different reasons including but not limited to: a) non-deterministic executions - including calls to local off-chain state, b) executing different code branches depending on say own endorser id.

3) There seems to be a lot of implicit reliance on the disjointedness of state modifiable in a given transaction (ie the keys associated with the targeted chaincode). Could this be made more explicit?

This is implicit as we felt hard coding this may be premature until we start talking about cross-chaincode transactions. In principle there are at least two approaches here i) partition the state across chaincodeIDs, but allow chaincode invoking other chaincode. We need to facilitate cross-chaincode transactions. ii) Have chaincode declare some variables as "private" and some variables as "public" where "public" variables could be modified by any chaincode.

ii) seems more difficult to manage than i) hence there is implicit bias to i). But not explicit one yet.

4) _For example, I presume that an endorser would report STALEVERSION if it got a deliver from the consensus service with updates that have yet to make it to the submitter - so the assumption is a low rate of clashing transactions (deliver being the result of a recent transaction for the same chaincode from this submitter or another one).

Precisely. That said, and also as discussed in previous responses to @mm-hl, the plan is to add, in future, some ability to chaincodes to reason about the concept of a leader that would handle the higher rate of clashing transactions.

5) The separation is so clean that I can't tell anything about how a consensus service would actually work - if you want me to think of it as a black-box then it's met that objective. If you want me to understand the system end-to-end, then that's not so good.

This is actually on purpose so I am glad that the objective is met :) Now, for internals of the consensus service, these will look very much like classical consensus protocols (be them Byzantine fault tolerant (BFT) or crash-fault tolerant (CFT), e.g., PBFT, paxos, raft, etc.) augmented with Kafka-like (pub-sub like) notification of all peers of the new raw-ledger block deliver() event and hashchain that puts together consecutive deliver events. Implementation details are omitted as the goal was to specify the API of the consensus service, not to mandate how the implementation (and the fault model) will look like. In principle, we wanted the spec flexible enough to easily accommodate in future not only BFT/CFT "classical" consensus but also PoW fork-style consensus protocols (although, strictly speaking, the proposal - as it currently stands - does not allow PoW). Hence - not too many details of the box internals.

6) 3.2 is confusing me... second para is mixing up peers and consenters in a way I don't understand.

Will have a look.

7) how different is a single blockchain with a partitioned consensus service differ from separate blockchains (except for the obvious sharing or not of the database instance)?

This is related to 3) and will be more obvious once we spell out cross-chaincode tx.

manish-sethi commented 8 years ago

@vukolic about your point 2) above - yes write-set also belongs to same category as read-set. I just wanted to confirm with you that you are OK about a transformed representation of transaction when moving from the raw ledger to the final ledger from consensus point of view and let it decide by the different ledger implementations.

In essence, I wanted to check whether you are fine with the consensus being oblivious of the following:

Representation of the simulation results in the blob (to support different data models and different versioning techniques such as previous_tx_id vs numbers)
transformation of transactions when moving from raw to validated ledger (as long as they have sufficient details for reproducing the state without executing the transactions)

Because, both of the above are produced and consumed by a specific ledger implementation and wanted to make sure that consensus is not dependent on these in any manner - so as to the specific details of these can be moved to the ledger discussion page (just trying to get to a better separation of concerns between consensus and ledger).

To take a specific example of version info representation - if a ledger implementation supports forks and wants to maintain numbering based versions, it can include the hash of the top block in the simulation results for endorsement and later validation. Similarly, another ledger implementation (or merely a configuration parameter in the same implementation) may decide to use previous tx_id. So, potentially we can discuss these things on ledger page.

vukolic commented 8 years ago

@manish-sethi indeed, consensus couldn't care less what would one do transforming raw ledger to validated ledger. In any transformation however (not only the one you suggest), the raw ledger hashchain is "broken" so it needs to be re-established by the validated ledger (e.g., in a way proposed in Sections 4 and 5). We can carry on the ledger page.

ijmitch commented 8 years ago

@vukolic : thanks for the comprehensive replies.

For my 5), I'm glad that objective is met too. In particular, I was looking at deliver() being an opportunity to integrate what we might have been looking for from the event framework. Since I'm looking at integration with existing systems, particularly on z Systems, I'm wondering whether this makes it plausible to register with the consensus service solely to get deliver() calls to be informed of what's going on. If a consensus client works in this mode, never broadcasting an update, then it wouldn't actually need to maintain a ledger or KVS in the classic sense.

vukolic commented 8 years ago

@ijmitch sure, a peer can purely be listener, never submitting/endorsing transactions.

That said, such a peer may want to maintain a ledger for efficiency. For example, if such a peer crashes and recovers it would need to fetch state from other peers as if it was a new peer joining the network. Possible - but not sure it is practical.

ijmitch commented 8 years ago

@vukolic : my use case is that the listening peer is actually something like a gateway to the outside world which simply consumes the data on deliver() - there could be non-blockchain systems that are 'interested' in the transactions on which consensus has agreed. This needs some consideration of confidentiality and identity propagation, but deliver() is starting to sound attractive for SoR integration.

elli-androulaki commented 8 years ago

Hi, We just updated the new architecture proposal with a section referring to transaction (chaincode) confidentiality. The link to the new section (section 6) is this one: https://github.com/hyperledger/fabric/wiki/Next-Consensus-Architecture-Proposal#6-confidentiality In addition, more details were added to Section 2 on transaction lifecycle.

vukolic commented 8 years ago

Also, the latest revisions contains updates to Sec.1.2 and Sec.1.3.3, in response to some previous comments here - notably from @manish-sethi (e.g., no 3) @ijmitch (e.g., no 3) and @mm-hl (e.g., no 33)

ijmitch commented 7 years ago

It's not obvious (to me!) from the new Section 6 how deployment of confidential chain code is limited to the submitting and endorsing peers - ie why committing peers don't see it. Perhaps it's that the deliver from the consensus service of the deploy transaction to the committing peers just asks them to write the encrypted blob to their copies of the ledger, whereas to the endorsers and submitters, it means do that and make the chain code operational - is that correct?

hyperledger-archives / fabric

Next Consensus Architecture discussion #1631