Allow deletion of raw blocks via PutRawBlock

jyellick commented 8 years ago

Presently, it is not possible to remove blocks from the blockchain.

This ability will be necessary in situations where consensus agrees that the current block number is lower than a block number the ledger possesses.

A proposed implementation would be to allow the PutRawBlock in ledger.go to accept nil as a value to indicate deletion for a particular block number.

ghaskins commented 8 years ago

If I understand what you are saying, I think this condition would only be possible with certain consensus algorithms (e.g. Proof of work) and pathological in others (anything with a quorum-certificate type model).

Assuming that is correct, is there any merit it allowing this policy decision to be configurable and/or enforced by the consensus plugin in effect? On one hand, allowing block deletion in a configuration that should never see rollback seems dangerous. On the other hand, I am not sure an internal interface to remove blocks is a legitimate attack surface we need to worry about.

kletkeman commented 8 years ago

It has always been my understanding that one of the tenets of cyber ledgers is that they are effectively immutable once a block is written.

Bitcoin solves the issue by forking the chain and having validators add to the longest chain, effectively forcing all into consensus and abandoning false forks. To my knowledge, all data is left on the chain, for audit purposes if nothing else. The question is whether something like that could be adapted to OBC's non-mining mechanism?

jyellick commented 8 years ago

There is one notable situation where I believe this is necessary.

The way the ledger currently reports blockheight, is via the highest numbered block in the ledger. So let's say that through accident or intent, one of the blocks on disk becomes corrupt, and instead of indicating say, block 345, it claims to be block 345345.

In this scenario, the replica detects that there is no chain from 344 to 345345, but it's impossible to 'recover' this chain, because it does not in fact exist.

Now, the replica learns that the consensus network is processing block 346. So long as block 345345 exists, the replica cannot execute new transactions. Waiting for blocks 346-345345 to process does not seem like a viable option. So, deleting block 345345 seems like the right answer.

I think the key to emphasize here, is that this is not attempting to modify an immutable ledger, this is intended only to execute once the node is already in a bad/corrupt state. This is a recovery path, not normal operation.

frankyclu commented 8 years ago

jyellick's point is to provide a mechanism in the code to remove corrupted blocks. However It does raise the question that if it would provide a legit way for people or even attackers to manipulate a ledger that is supposed to be immutable.

The problem need to be addressed here is not whether this API should be needed, but how we first define a way to handle disasters, such as situations when your ledger got corrupted

I am assigning this issue to myself as this should be a feature included in the release plan

kletkeman commented 8 years ago

You mean, presumably, that the chain will be trimmed back to the last known uncorrupted state and repaired by asking a peer to send the rest of the chain?

Of course, if the corruption was caused by a bug in the software that is everywhere, then there is a non-zero chance that all chains are corrupted and they all enter the recovery state within seconds. Of course, it is more likely to occur after a local upgrade, so perhaps it could include an immediate rollback of the chaincode and then the repair of the chain.

Interesting issue.

ghaskins commented 8 years ago

@jyellick I see, that makes sense.

@kletkeman re: immutability. While immutability is certainly desired/required, bitcoin/POW lacks those guarantees due to its design (which is why its uninteresting to a portion of the blockchain community where immutability is paramount). For instance, consider a partitioned network in a POW design: Both sides of the partition may commit disparate blocks during the arbitrarily long outage, but the inferior partition may experience rollback after the partition is repaired.

jyellick commented 8 years ago

The details of the recovery would be up to the consensus implementation.

For quorum based implementations, the chain would presumably trim any transactions which the consensus agrees have never happened. Further verification and recovery of transactions which have occurred would also depend on the consensus implementation.

My original intent of this issue was simply to suggest a possible modification to the the internal API to enable consensus implementations to perform recovery however they see fit. However, it might be good to discuss more broadly what defaults and configurable parameters should exist. For some applications any automated recovery which involves deletion/modification of blocks on the chain might be unacceptable, regardless of consensus. For others, completely autonomous recovery might be the only option.

My inclination would be to default to allowing consensus to recover the blockchain in whatever way it sees safe. Then, optionally, expose configuration for the replica to simply stop and await human intervention to assist in recovery of the bad state in the event that committed blocks must be deleted or altered.

ghaskins commented 8 years ago

@jyellick my gut says that we don't have to worry too much as long as the verbs are not accessible from something external, such as the VP REST API, consensus protocol, or chaincode API interface. Obviously a malicious or broken layer with access to the proposed API could wreak havoc. However, below a certain level in the system, any number of bad things could be done (such as a human doing "rm -rf /var/openchain) that would simply fall under "byzantine" type scenarios in which the presumably 2f+1 remaining healthy nodes help keep the network itself afloat. Now that I understand what you are aiming for, I don't think its necessarily a bad idea as long as the attack vectors are considered.

ghaskins commented 8 years ago

Also, as a counterpoint for how much recovery functions are needed: A valid recovery option should technically be to simply blow the node's state away and let it rebuild from scratch from the network. This needs to be supported anyway, so we will need to do a careful risk+reward analysis for any advanced intermediate recovery efforts IMO.

kletkeman commented 8 years ago

@ghaskins I believe that bitcoin does guarantee that the chain is immutable, but does not guarantee that a branch will prevail. Thus, it can never give 100% certainly. But I do agree that such an API for internal use may be needed.

On the other hand, discovering corruption might mean that the node should to a hard reset and come up as if new, rolling back whatever update may have caused the corruption and getting a new copy of the chain sent to it by the network of validators as if it had just been added as a validator. That would be pretty safe as recovery options go, no?

@jyellick My point is that the consensus algorithm cannot agree that some number of transactions never happened because, by definition, they did. Allowing plugins to muck with the chain is an open door to hackers and fraud IMO. Even allowing the node to attempt a repair of the chain by mutation sounds like a dangerous opening for a hack.

kletkeman commented 8 years ago

Ah, I see we came to the same conclusion. Great.

ghaskins commented 8 years ago

@kletkeman I suppose it depends on how you define "immutable" then ;). Bitcoin allows blocks that were once confirmed (from the perspective of an observer such as an SPV wallet) within the chain to effectively vanish if the network decides to change branches. This to me means that the chain itself is not immutable. Would you agree?

kletkeman commented 8 years ago

@ghaskins To my knowledge, Bitcoin never guarantees that any branch in the chain is going to survive. It works on probabilities, which increase dramatically once a branch is superseded in length and the new one starts to grow and obtains consensus. I don't think that we can use mutable or immutable in that context, but rather less probable and more probable.

The physical chain, however, remains immutable, again to my knowledge. Another branch may become the main branch, but the branch that was superseded and all its transactions remain intact. Valid transactions that are "lost" to the newly valid chain are eventually recirculated I believe, and find a new home in a new block.

I'm not sure how the wallet might perceive the supersession. A transaction that was invalid might vanish effectively, but it could still be found and would return some "invalid" signal I think. I've not read in enough depth on Bitcoin to know the exact nature of these edge cases.

ghaskins commented 8 years ago

Bitcoin/POW is predicated on maintaining perpetual access to "51% cpu power". The fundamental flaw in the design is there is no way to quantify and authenticate what constitutes "51%" at any given point in time other than to assume that the longest chain must have been created by that 51% and that you never lose total connectivity to that capacity. Both of those are dicey prospects, the result of which lends it to being susceptible to arbitrary rollbacks of previously committed blocks under certain circumstances. In many scenarios, the rolled back transactions may be manually replayed on the newly dominant branch and confirm just fine, but there is no guarantee that the transactions will still be valid against the new state of the ledger. IMO, If your partition was from a malicious act, there's a good chance the transactions will be intentionally invalid.

Consider a ledger consisting of blocks A->B->C->D->E at t0. A new longer chain could emerge at t1 that forked after B to look like A->B->W->X->Y->Z. If I had a transaction confirmed in C, it just disappeared from the perspective of validators (W can be thought of as C'...it has the same block height as the original C but no longer contains the transaction that I once believed it did). Bitcoin proponents will tell you that the solution is that you need to simply wait for enough blocks (6 blocks / 60 minutes is typically advised) to make it less likely for an attacker pool to "catch up". The problem is, with no definition of what constitutes the 51% capacity, no way to identify whether one is connected to "the real capacity", and virtually no practical limit on how far back a fork can rollback, there isn't actually a safe number of blocks to wait: A rollback could technically occur at any time in the future for any depth (at least back to the last checkpoint shipped with the software), and there is no guarantee a transaction replay will work. Is this likely? Probably not. Is it possible? Absolutely. All you need is a network partition that exceeds 6 blocks/60 minutes in duration. Messing with someones BGP routes is a way this can (and has) occurred.

That isn't to say its completely rosy on the other side of the fence. Even something like PBFT is based on assumptions (in this case, the assumption is that no more than f out of 3f+1 nodes will be byzantine).

So bitcoin assumes that there will be no absolute network partition between you and an undefined set of unidentifiable nodes that lasts longer than 60 minutes. PBFT assumes there will be no more than f faulty peers among a cryptographically verified whitelist of 3f+1 nodes. If either of those assumptions are violated, certain elements of the past have the potential to be altered, thus impacting the ledger's immutability and a transaction's validity. Further, we would be left without any way to even detect that it happened other than an external audit from a redundant data source. So to your point, I suppose we could say both systems are immutable up to their given design constraints, and then all bets are off. The difference is simply how easy it is to subvert the assumptions (the probability you speak of) ;)

So let me restate: crypto-ledgers by their nature are guaranteed immutable up to certain limits (some lower than others). IMO bitcoin's/POWs limits are too low to make it practical for use in any substantial extrinsic system of value because there could be a large incentive disparity between BTC and the external assets. But if that weren't the case, we wouldn't be here. We'd be hacking Ethereum instead.

kletkeman commented 8 years ago

Bitcoin is a special case in that it gets much more expensive to mine Bitcoins at regular intervals, making it very difficult to recreate history sufficiently to attack the chain with a fraudulent branch. But you are right that it has flaws that cannot really be tolerated in systems that require higher performance. We can't wait 10 minutes per block, especially at higher volumes ... the backlog could run to days. So that system cannot work.

But this all started with a simple request to add a block deletion API if I recall, and that method is also flawed.

I prefer to go back to our joint thought on simply blowing away the node if it corrupts its chain and starting fresh. Adding nodes is already supported, so blowing one away only requires a bit of extra work to make sure it starts clean and uncorrupted.

ghaskins commented 8 years ago

Note that the the attack vector I am referring to is actually in the opposite direction (creating a shorter chain in a weaker pool within a forced subordinate partition, allowing a transaction to confirm, and then releasing the partition back to the general network where it will almost assuredly be rolled back by the higher bandwidth network). An attacker could partition the network, transact with their victim(s) on the subordinate partition and the real bitcoin network simultaneously (thus double spending), and then release the partition. The victims would see their confirmed transactions be rolled back because the subordinate partition would not have the same computing power available.

Anyway, I digress. While I don't necessarily agree that the proposed API change is necessarily broken if carefully designed, I do agree its not likely worth the effort over a node-rebuild.

kletkeman commented 8 years ago

An interesting attack vector. It requires that a hacker make the effort to set up an alternative to Bitcoin and then create an application that can use both Bitcoin and the alternate cyber ledger to transact. It also requires that this remain hidden long enough for a big score. And since the Bitcoin community is pretty well-versed in this sort of risk, the double spend rollback would have to occur on the new alternate obc-driven network..

Also, partitioning the network could be detected and could throw all sorts of alarms. Perhaps iobc should refuse to complete transactions if there are too few nodes on the subnet. This is the sort of systems engineering that will have to take place anyway, else the platform cannot be used in any environment where fraud is common (and is there any other kind of environment?)

I am not, by the way, suggesting that such an API would not be carefully designed, although thinking of every possible fraudulent use of a low-level API that offers an opportunity to rewrite history is almost certainly harder than it seems.

Rather, when the consensus algorithm is allowed to make the decision on what to do (as was mentioned,) it really means letting the designer of the new consensus algorithm make the decision as to how the API is used to "fix" things. So it comes down to using the API carefully, and that requires expertise that may or may not be out there in the community that will use the platform.

hyperledger-archives / fabric

Allow deletion of raw blocks via PutRawBlock #407