Open jyellick opened 8 years ago
I would agree that in general, simple example chaincodes should not use persistent state. However, even though there was a bug in the busywork chaincode that caused unexpected non-determinism, I'm not sure I'm ready to banish chaincode-local state for chaincodes that are specifically designed as test programs. For example, one of the planned future capabilities for busywork was to allow non-deterministic error injection, which would make use of chaincode-local state. (A single chaincode instance is passed information via a query targeting its peer, and on the next invoke that chaincode instance stores a random or know-bad value, for example.)
There may also be optimizations for test programs (or even non test programs?) that require some knowledge of what is expected in the state, and perhaps cause information to be reloaded from the state when is there is a mismatch.I was actually planning to add something like this in the next busywork chaincode I've been thinking about. This chaincode loads a very large data structure and its version number when the chaincode starts. On each invoke, it compares the latest version number (8 bytes) from the state with its own cached version number, and only reloads the large state if they disagree. Obviously the version number is also updated whenever the large state is modified (which will be a rare event).
In general, in Go anyway, I don't think you can stop anyone from storing data in a package-global variable, and then accessing the variable, even if you make each chaincode invocation a newly-created object. It is certainly unacceptable to take the drastic step to reload the chaincode package binary every time the chaincode is invoked (assuming Go supported that, which I don't think it does currently). There is also the file system, network devices, and other ways to store and access persistent data. Unless we go to a domain-specific language.
It's clear that we will not be able to completely disable non-determinism while using go. However, we should make it as difficult as possible to mistakenly introduce non-determinism in chaincode. How hard it will be to test for non-determinism behavior should not influence our API design for correct chaincode.
Related: #1751
We have developed a reasonably big chain code as a prototype application, and found that it is quite easy to write a non-deterministic code. For example, one can write non-deterministic chain code by
I am sure we can find more. The problem is that it is quite difficult to debug the non-deterministic chaincode. An app developer can easily develop a chaincode that is non-deterministic unless he has sufficient understanding in the behaviour of Go programs. Some kind of domain specific language, or a validation tool, should certainly help.
@bcbrock I can certainly appreciate that if your interest is specifically related to testing the chaincode itself, that you might need to store some local state. Maybe an easy compromise would be to throw a comment in the local data structure pointing out that it's a dangerous practice and one of its purposes is to allow injection of faults into the system.
@corecode @muralisrini I am no expert by any means on chaincode lifecycle, but it seems like it would be nice if there were some event or invocation that could signal a chaincode that it has restarted, or missed invocations, or any other sort of event which could cause its local state to become out of sync with the ledger. @bcbrock mentioned thinking to query a single key, check it against a local version, to decide if he needs to query the full data structure. This seems like it could be a common pattern, so if we could provide some sort of 'invalidate' whenever the chaincode's state is modified not through an invoke, it might make such programming easier and less error prone.
As another comment, I discussed briefly with @muralisrini that it would be nice to include some chaincode tooling which could try invoking a chaincode multiple times, randomly restarting the chaincode periodically, and ensuring that the result from all execution series matches.
@sachikoy +1 and perhaps part of fabric-chaintool
Any chaincode that's included in the fabric can serve as an example for others. Unless intentionally called out, such as @bcbrock's plan to inject non-determinism for test purposes (ref. one of the planned future capabilities for busywork was to allow non-deterministic error injection ), I'd say all chaincodes in the fabric should serve as good deterministic examples.
+1 to @sachikoy 's enumeration of possible ways to make it non deterministic (although I'd question parallel queries with an invoke categorized as "non-deterministic"... it doesn't make the chaincode itself non-deterministic but only affects the application that uses the query, which is a very different thing). We should begin listing all such usages which then could serve as things a validation tool (as @binhn suggested, perhaps part of fabric-chaintool ) could check for (thoughts @ghaskins ?). Here's a non-exhaustive list (some of these can be relaxed as we move to new consensus architecture)
Go specifc (from @sachikoy )
What does everyone think about collecting a list either in Wiki or as a issue ( @binhn ? )
+1 to @jyellick suggest for a chaincode testing tool. This is different from the validation as its more a "black-box" evaluator.
We have no mechanism to realize that the chaincode may have to reload its state. For now, the most conservative strategy is to not cache anything in the chaincode. Clearly it already is very difficult to write deterministic chaincode in the first place. Let's leave out dangerous optimizations for the time being.
According to the top-down design and the assumptions of the blockchain architecture, every transaction invocation must start from the persistent state, as if freshly rebooted. Chaincode must never buffer state, keep variables, etc. Relying on any non-persistent state violates some assumptions made at the consensus protocol level, where it should be possible to restart a validating peer from a given state of the ledger at any time.
I agree that the examples presented in the code should either be demonstrating proper use OR at least clearly (very clearly) noting that we are intentionally doing something bad. Perhaps those cases should even be in their own "anti-pattern" part of the tree to prevent casual observer confusion.
That said, I agree with @bcbrock that we shouldn't preclude intentionally bad code because it could impact testability. I suppose one could argue that if we had tooling that could prevent badness, then the badness could never exist, to begin with. However, until we get to that point, the facilities for injecting bad behavior should remain.
Regarding state/restart, I fully agree with the comments here: the only valid way to think of chaincode is like a pure function S' = f(S, M) where it only considers the state "passed in" (which I know isn't quite the literal model, but it is the effective one). Anything else is a violation of proper behavior and ideally, should be discouraged if not downright policed (testing exceptions aside).
I digress: In this model, the code really should be restartable between invocations without any coordination with the chaincode itself...this is a property of the execution environment, not the code. That said, Golang presents some challenges in this regard because of how the language runtime works. For instance, we could play games with a C++ program to make the kernel suspend the process when it is not actively processing requests...this is not likely to be tolerated well with golang because presumably, the runtime needs to perform periodic tasks like GC that may be adversely impacted if we take the cpu time away from it.
The only solution for complete golang restart that seems remotely feasible would be a fork/exec model. I actually looked into this a while back. IIRC, my initial testing (on a meager laptop, I should mention) I seem to recall the overhead for a bare golang program was about 3.5ms/invoke whereas C++ was about 2.5ms. This would put the golang program at about 285tps max for a serialized flow (though note MVCC and other such tricks should allow that to parallelize for some/most access). That was slow enough to discourage me from pursuing further at the time, but it's fast enough that we might be able to revisit it.
Regarding chaintool code verification, yes, this is certainly possible. We would need to build an EBNF grammar that describes golang code (assuming a ready-made one can't be found on the internet) and an AST validator. I assume this would be a fairly complex topic in of itself. We might be able to get some analysis for free from tools like go-vet, too, which is an interesting service in of itself.
Let me point out that the top-down design and assumptions that @cca88 refers to above are non-obvious and completely undocumented in any form accesible to a user of this system. (I would not accept the low-level protocol and consensus API specifications as appropriate in this respect, even if they were complete.) Before spending time implementing tooling to enforce undocumented aspects of the architecture, some effort should be spent in actually documenting the architecture. Hopefully the transaction lifecycle document proposed in #2031 will prove to be a good start.
Note that the only reason that the "nondeterminism" found in the busywork chaincode was a problem is because of the undocumented implementation decision to "catch up" lagging peers by transferring state, rather than allowing the peer to actually execute every transaction. As far as I can tell there is nothing that fundamentally preculdes a database implemented as a blockchain from supporting stateful chaincodes. So I firmly disagree with @ghaskins about which thoughts are valid, so to speak.
If the Hyperledger fabric is truly going to be pluggable, we need to be careful not to disallow legitimate application architectures based on only one way of thinking about the problem.
@bcbrock the only reason that the "nondeterminism" found in the busywork chaincode was a problem is because of the undocumented implementation decision to "catch up" lagging peers by transferring state, rather than allowing the peer to actually execute every transaction.
Catching up is not the only case that'll trip chaincodes using "global" vars. Here's another one - Chaincode crashes and next invoke brings it up. All its global memory is gone and it computes differently from its counterparts in the chain.
I agree documentation is sparse in this area but that does not negate the fact that chaincode in the fabric should adhere to best practices.
@ghaskins Regarding chaintool code verification, yes, this is certainly possible. We would need to build an EBNF grammar that describes golang code (assuming a ready-made one can't be found on the internet) and an AST validator. I assume this would be a fairly complex topic in of itself.
This sounds a lot like building a DSL ? In any case, definitely intriguing.
+1 for DSL
@bcbrock I am not sure of the details of the issue you are referring to w.r.t. state transfer and busywork, but generally speaking non-determinism in a blockchain is always a problem IMO. Therefore, we should strive to help application developers write correct code, and detect/protect the system when they don't. I'm not sure if I misunderstood you, but from your last comment, it sounds like you are of the opinion that non-deterministic results are not only acceptable but possibly even desirable in certain scenarios? Or did I read that wrong? If so, could you elaborate?
Here are my thoughts:
As I see it, the job of a "validator" in general blockchain terms (i.e. ignoring fabric specific terminology/flows) is:
1) Take a transaction and decide if it is "legitimate". Whether something is legitimate will depend on a multitude of factors both of the fabric and chaincode(s) themselves. For instance, does the destination chaincodeID exist? Is the submitter authorized to invoke this operation? Were other rules of validity (such as proper UTXO lineage) followed? Etc. If the transaction is legitimate, it can then be agreed upon by some consensus protocol as to permit it to become a permanent part of the immutable record. For instance, 2f+1 out of 3f participants all came to the same conclusion. Part of this criteria should include "did I get the same result as the others?". Otherwise, there is no point of having a shared accounting of state, you only have an input/ordering agreement which is substantially weaker in the context of blockchain. Incidentally, this is also why I say raw PBFT without EV is not very useful.
2) The second phase is deciding whether the blockchain record itself is legitimate: E.g. Is the chain of blocks presented coherent, valid, signed by at least 2f+1 members of a whitelist that is locally deemed authoritative, etc?
It is specifically for the second case that state transfer exists. Some peers may not have been online at the time the block was committed (or have otherwise since lost their record of it, etc). It is not strictly necessary from a correctness standpoint for a node that is catching up to recompute the result. The record of at least 2f+1 signatures for the blocks in question either exists or it doesn't. If they exist, there is nothing that the peer that is catching up with can contribute to the network as a whole that is meaningful. If the node wants to re-execute the transaction(*) for its own verification rather than accept the state with signatures from other nodes, I don't have a specific problem with this (auditors may be interested in this, for instance).
The important takeaway here is that transaction-execution is primarily a service to the network, not the peer. Peers perform operation (1) in the hopes that they may be useful to achieve at least 2f+1 agreement. Once this threshold has been crossed, "validation" is a function of a cryptographic integrity check of the chain coupled with trust that 2f other nodes are unlikely to have colluded against you. The peer always has to have trust of the other probability that 2f of the remaining 3f nodes did the right thing or the protocol DOES NOT WORK. Therefore, the only truly useful local validation, in this context, is operation (2).
*) An important side note is that this is not always possible: Some nodes may not be able to run code (non-validating peers or nodes outside an endorsing scope, for instance). In any case, the only result of this operation, when possible, would be to confirm that the peer can recompute the same value as the rest of the network in the 2f+1 set. It should never get a different result IMO. If it does, it indicates a problem in the network as non-determinism has no place in a blockchain. I, therefore, don't understand your comments that the busywork result was somehow impacted by state transfer rather than recomputation unless the chaincode was (intentionally, or otherwise) non deterministic and therefore, IMO, broken.
To add to @muralisrini's list, it's not just catching up or crashing, but can be as simple as a transaction that for whatever reason needs to be rolled back (for instance, failed consensus). If the network is structured as pure functions, I can take S' = f(S, M) and simply throw S' in /dev/null if I need to without any coordination with the chaincode. If f() maintains global state, I cannot.
Another problem would be concurrent access. If f() is pure, I should be able to compute S' = f(S, M1) and S'' = f(S, M2) concurrently and do tricks like an MVCC merge of [S', S''] when unreconcilable conflicts are not present.
@muralisrini I am actually in favor of some kind of DSL for chaincode to help some of the things we have been discussing (non-determinism in GPLs). However, in this context, I was referring to an EBNF/AST for golang which is what would be needed to build a parser/analyzer.
@ghaskins - There are different types of "non-determinism". A chaincode that stores random numbers obtained from nature is non-deterministic, no argument. A chaincode that is aware of its own creation, and then caches data from the database at creation and updates cached data consistently with that environment is only made "non-deterministic" by an implementation (like the Hyperledger fabric) that modifies the state owned by the chaincode without the chaincode's knowledge. This is an implementation choice, not a fundamental requirement or definition of a "blockchain". That was my point.
You've made a good argument as to why the Hyperledger fabric should be implemented the way that it is. I'm not arguing that this choice is wrong, only that there are other ways of doing things that are also valid. If restrictions are made in user behavior they need to be supported by good arguments such as yours and not simply cast as being obviously correct because of the way something happens to be implemented today.
@ghaskins right, I understand now.
@bcbrock I see, and yes this makes more sense than the view that non-determinism is ok. Rather, the point is that transaction semantics were not apparent to you and/or documented in a reasonable manner, point taken.
In defense of the fabric team, this general notion has precedent in the rest of the cryptocurrency world. Networks like Bitcoin and Ethereum enforce this via the execution environment they provide. For instance, Ethereum's EVM doesn't offer a mechanism to create side effects, and therefore transactions can just be dropped silently with no impact to and/or coordination from the contract as part of its normal flow.
The choice to use a general purpose language like Golang rather than a DSL like Solidity has pros and cons. One of the cons is it's really easy (as this case demonstrates) to create side-effects like caches and go-routines that can escape the platforms transactional boundaries. One way to retain the use of golang would be to extend the transactional hooks into the application (e.g. register for rollback notifications, etc). This would allow your app to cache (when it can) but clean up when it's required. However, this adds complexity that is often hard to get right especially as it requires each individual application to implement the meat of the rollback. My opinion is we are better off attacking the problem differently: either create a DSL which is conducive to constructing side-effect free functions, or tooling which helps people write golang in a side-effect free manner and to verify its conformance.
We do not necessarily have to opt for one solution. Just like there are many "normal" programming languages today - we can have many programming languages for blockchain. So we could have both golang and other high level languages and DSLs.
That said, it would be nice to see community effort of porting Solidity to HL fabric.
Description
After some discussion with @muralisrini related to the non-determinism bug found and fixed in PR #1969 it was suggested that an issue be opened to remove the chaincode instance variables from the example chaincodes included in the fabric tree, including the excellent busywork suite from @bcbrock.
As others may try to use these chaincodes as a starting point for their own application, we should demonstrate chaincode best practices in tree, and ensure that none of our chaincode relies on local state which will not be replicated across chaincode container restarts, or state transfer.
Another bolder suggestion would be to completely disallow variable persistence from chaincode invocation to chaincode invocation, possibly by forcing chaincode instances to be recreated from invocation to invocation, possibly via something like a factory pattern as suggested by @corecode.