neo-project / neo

NEO Smart Economy
MIT License
3.47k stars 1.03k forks source link

checklist of 3.6.0 PRs that will cause FORK #2910

Closed vang1ong7ang closed 10 months ago

vang1ong7ang commented 1 year ago

hope all the FORK points can be resolved before the upgrade

by submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0.

ALL the PRs listed above should not be blamed, since looking at these pull requests individually, there is no problem with their implementation. while in the context of upgrading from v3.5.0, more historical baggage is introduced. maintainers should create a new PR to handle all possible fork points to ensure mainnet state consistency.

I'm fine if the final decision is to ignore the fork risks and proceed to upgrade to v3.6.0. we will still try to monitor, warn and prevent possible forks as much as possible to keep the mainnet safe though I don't think we should let the mainnet take the risk.

hardfork PRs should learn from #2810

if (!IsHardforkEnabled(Hardfork.HF_Basilisk))
{
    RuntimeNotifyV1(eventName, state);
    return;
}
dusmart commented 1 year ago

(Optional) Additional context

Based on NEO's current situation, I also understand you if the final decision is to ignore the fork risks and proceed to upgrade to v3.6.0.

roman-khimov commented 1 year ago

2849 shouldn't be an issue, its behavior is controlled via the hardfork (see #2881). But I agree that many other can be problematic. Unfortunately, we have an unwritten convention of relying on resyncs (at least since #2560) and to be fair it even works most of the time. Anyway, using proper upgrade mechanisms would make Neo a much better place, especially since we have them (both native activations and hardforks, but likely HFs are easier to use and can cover any situation).

roman-khimov commented 1 year ago

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible. If some PR is only adding some new native contract API then any attempt to call it will make transaction FAIL miserably. This means that after the upgrade it can do something instead of failing, but it's a bit harder to leverage that (not impossible though).

dusmart commented 1 year ago

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible.

They are basically the same. You can send a transaction first, and then use the second to check it's state in FORKTRIGGER().

Any change including add, delete and modify should all be treated seriously.

vang1ong7ang commented 1 year ago

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible. If some PR is only adding some new native contract API then any attempt to call it will make transaction FAIL miserably. This means that after the upgrade it can do something instead of failing, but it's a bit harder to leverage that (not impossible though).

it's not right.

by calling Ledger.GetTransactionState, the attackers can still succeed.

dusmart commented 1 year ago

by calling Ledger.GetTransactionState, the attackers can still succeed.

I also figure out a new method to do the same if there is no such a native contract. If one tx in 3.5.0 will fail, you can add GAS.burn(GAS.balance(sender)) in it. And then send a second tx that will cost some GAS. When it comes to 3.6.0, the second will be an invalid tx and stuck the upgrade process.

vncoelho commented 1 year ago

In fact, @igormcoelho and I have been in favor of not changing tx state after any modification.

That was the reason we created that simple plugin on neo-modules for neo2x and it is still used nowdays.

We need to preserve states, MPT or Graphs are mostly for this purpose of having a solid compact state representation.

A fork tag can change the behavior just for the next events. I do not see much options instead of a general tag and ensurance of past states. As we have observed in other protocols, we do not need witness itself, we just need the result of that verification at that time. And, for sure, that should never change.

vncoelho commented 1 year ago

My suggestion is to:

Surely there could be options to flexibilize that, maybe we have a patch that makes sync faster and it is compatible.

dusmart commented 1 year ago
  • Abolish resync during upgrades.

Actually there are some PRs rely on resync heavily. The PR for optimizing the vote reward data requires a full resync to help future reward calculating.

My advice is to drop these PRs from 3.6.0 if we need continuous compatibility. For #2841, there are only about 4GB data for a normal node. I don't see any significant reason to do it and make the upgrade risky. If you have ever run a ethereum node or some others, you will know NEO is far more better on that.

Jim8y commented 1 year ago

What resync actually means? Reexecution or re-downloading everything?@dusmart

dusmart commented 1 year ago

What resync actually means? Reexecution or re-downloading everything?@dusmart

I mean re-creating the previous state here based on whatever makes it happen. It makes the MTP useless on checking whether the upgrade is abused or not.

vncoelho commented 1 year ago

I fully agree @dusmart. That enters in the case when full resync is just recommended if there is a relevant improvement on the sync process and data. 4GB is not really relevant now.

dusmart commented 1 year ago

BTW, Vang has done a POC that makes it even harder to identify whether a TX abuses the upgrade or not. I believe advanced patten matching could also fail here if you've seen that OpCodes.

If the previous state is kept, then a simple MTP state root check will be very helpful for identifying it.

vncoelho commented 1 year ago

BTW, Vang has done a POC that makes it even harder to identify whether a TX abuses the upgrade or not. I believe advanced patten matching could also fail here if you've seen that OpCodes.

If the previous state is kept, then a simple MTP state root check will be very helpful for identifying it.

This guy has no limits...ahahaha

The state preservation is the correct way to go. For now, revert those that require resync if they are not critical or with a considerable relevance to optimization.

I can push these PRs next week if you want, but I think you guys from NGD are more expert on that, count on us for support and review in any case.

vang1ong7ang commented 1 year ago

The state preservation is the correct way to go.

I fully agree with that. it finally ensures the block finality feature and provides necessary guarantees to applications and exchanges.

dusmart commented 1 year ago

This guy has no limits...ahahaha

Ahahaha, if you notice his simple POC here in detail you will find that he use NEO.transfer instead of GAS.transfer. NEO contract's state would have already been changed by the #2841. Therefore a simple change to the MTP plugin (that bypass the NEO contracts' state) can not identify it, either.

If they still push it on mainnet and want to make sure the success, they will have to bypass all the state changes made by the new PRs and calculate the state root, then compare it with the old one. I'm not sure whether someone has had a plan on doing it. Or just want to pretend that there should have no hacker making use of it. Sorry that I made a malicious assumptions.

vang1ong7ang commented 1 year ago

Unfortunately, we have an unwritten convention of relying on resyncs (at least since #2560) and to be fair it even works most of the time.

Originally posted by @roman-khimov in https://github.com/neo-project/neo/issues/2910#issuecomment-1722136253

please don’t take chances and put mainnet users at risk.

Having sex without a condom for several years without getting pregnant is not something worth showing off.

roman-khimov commented 1 year ago

@vang1ong7ang, @dusmart, I'm explaining the status quo and the way it came to be. These problems are known, they're not new. There are bad things with the current approach, there are good ones as well. The topic of state finality had been beaten to death (literally, as in state finality death) in #1526.

dusmart commented 1 year ago

There are bad things with the current approach, there are good ones as well. The topic of state finality had been beaten to death (literally, as in state finality death) in #1526.

Thanks always for lots of links. They really help. Great to see the context.

Then I think that's what we should pay more attention to instead of optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

I doubt that someone has exploited it in a small amount. Could you please share us more of the details on how NEO check the upgrade's safety before?

roman-khimov commented 1 year ago

that's what we should pay more attention to

Likely so. Some Neo implementations even have StateRootInHeader extensions.

optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

2841 is trivial to fix keeping exact old behavior for NEO API. At least while we're operating in a "floating state" paradigm. And it solves #2815, which is about specific use cases for this data, not exactly about performance.

Could you please give us more of the details on how you check the upgrade's safety before?

I guess that's a bit out of my scope. But in general state changes (as in https://github.com/neo-project/neo-modules/tree/master/src/StorageDumper) can be compared, usually it's just the genesis that changes (new methods are added). #2841 is like the first real exception to this in two years, there NEO/GAS balances were compared directly IINM. And yes, of course this is racy (check at height N says nothing about height N+1, upgrades take time) and it's not just new methods, there can be new opcodes/flags/whatever.

dusmart commented 1 year ago

I guess that's a bit out of my scope. But in general state changes (as in https://github.com/neo-project/neo-modules/tree/master/src/StorageDumper) can be compared, usually it's just the genesis that changes (new methods are added).

If you haven’t pay attention to it, I think most of developers didn’t, either. We all count on others to do these trivial things.

Actually, I proposed to Vang that a simple abuse on mainnet should be done, sending NEP17 to poly network using the method mentioned in this issue to test if someone of NEO or Poly could catch it. The result will probably be bad.

He took his responsibility on opening an issue here to correct our upgrading method. Even if the previous one could draw more attention and win him more reputation. Even if he knows most of developers may have already known this issue and could ignore his effort.

roman-khimov commented 1 year ago

If you haven’t pay attention to it

I'm paying a bit more attention to state matches between NeoGo/NeoC# (they usually match whatever they are and yes, we compare state changes the way described above (at least until new signed stateroots start floating around which NeoGo always catches and compares)) and NeoFS sidechains (they usually work and requirements there are somewhat different).

opening an issue here to correct our upgrading method

That's good. But it's a paradigm shift. So I'd prefer to do this post-3.6, at least to be done with some current set of changes.

Even if the previous one could draw more attention and win him more reputation.

Can't really comment on it. I wouldn't want that reputation.

vang1ong7ang commented 1 year ago

I DO NOT WANT ANY KIND OF REPUTATION.

Objectively existing problems and facts should be given primary attention.

fork is far more harmful than deny of service.

To be honest, I don't have any hope that anyone will fix the problems mentioned in the current issue before v3.6.0. So we are improving network monitoring and some other emergency response plans to reduce losses when extreme situations occur.

vang1ong7ang commented 1 year ago

I observed both neoc# and neogo lack rigorous status checking when upgrading. Although I don't know much about neogo, I know that its behavior is different from the c# version in many implementation details. This difference in implementation can easily lead to inconsistency between the two states, which is a dangerous thing on the blockchain (1. So far, no one has deliberately used this phenomenon to create problems. 2. The absolute mainstream of c# impl in the neo ecosystem alleviates covers the severity of this problem).

vang1ong7ang commented 1 year ago

@vang1ong7ang, @dusmart, I'm explaining the status quo and the way it came to be. These problems are known, they're not new.

I don't think it is the most responsible way if you have already known these problems but still decided (or tacitly allow) to upgrade to 3.6.0 without any protection.

There are bad things with the current approach, there are good ones as well.

of course, everything has both sides, even the deny of service bug has good things, at least, it's one of the most efficient way to pause the network and reduce loss spread if attackers make use of the vulnerability mentioned in this issue.

optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

2841 is trivial to fix keeping exact old behavior for NEO API. At least while we're operating in a "floating state" paradigm. And it solves #2815, which is about specific use cases for this data, not exactly about performance.

roman-khimov commented 1 year ago

The absolute mainstream of c# impl in the neo ecosystem alleviates the severity of this problem

It's not. It just:

decided (or tacitly allow) to upgrade to 3.6.0 without any protection

Please, check 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 changes.

it just provides the ability to repair the original error state in extreme cases

Yeah, sure. But once you have it's soooo convenient to use it for other purposes. Good ones, btw. And so it's used this way. At least for the past two years. With every minor release (and sometimes a bit more often).

vang1ong7ang commented 1 year ago

decided (or tacitly allow) to upgrade to 3.6.0 without any protection

Please, check 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 changes.

that's what "tacitly" means.

every time i was expecting it should be the last "floating state" fork, even till the basilisk was introduced.

vncoelho commented 1 year ago

We must not upgrade to 3.6.0, this is not correct to put mainnet state at risk.

Let's revert the ones that need resync, like voting reward. We, then, solve the others with tags.

The voting reward could be merged with a tag as well, and we start to improve it from now. Later, we do a tentative to resync past states in a safe manner in order to optimize clients' data.

roman-khimov commented 1 year ago

I'm sure we don't need any reverts. In almost all cases (probably with the only exception of #2892) API changes can be tied to the Basilisk hardfork. Of those I'd say NeoToken.GetAccountState and NeoToken.UnclaimedGAS are the most problematic. We need 3.6.1 to fix #2907 anyway, so we can think of basiliskifying our code a bit more (it doesn't take a lot of time either, can be rolled out next week).

dusmart commented 1 year ago

First of all. I want apologies to core developers whose work are listed in this checklist. I want to be absolutely clear that I fully comprehend and firmly stand behind your tireless efforts.

Since this issue's name is FORK instead of HARD FORK, I think we should add https://github.com/neo-project/neo/pull/2818 to the list cause it's a soft fork.

Soft fork will not cause a real fork on the mainnet and do not have that large damage. But it will bring obstacles to nodes that will be upgraded a little later.

For soft fork, it will also be better to activate it after the new fork tag. Then we can give all nodes (wallet's, CEX's, crosschain's, dAPP's ...) a few more time to do a leisurely upgrade.

vncoelho commented 1 year ago

image

image

@dusmart @vang1ong7ang

Now it can be replicated with NeoCompiler Eco in a kind of a simple manner. Here you can see two clients with different balances. One was re-sync with 3.6.0.

vang1ong7ang commented 12 months ago

Mainnet is at 3.6 now, which makes 3.6-specific part of #2910 irrelevant (not the underlying problem, of course), removed from the list.

Originally posted by @roman-khimov in https://github.com/neo-project/neo/issues/2914#issuecomment-1727319271

cheers :)

this is not the most responsible way. let's exercise some judgment, please.


there are 3 blockchain explorers i know on neo network which is https://neotube.io/, https://explorer.onegate.space/ and https://dora.coz.io/

after sending the upgrade detection transaction on block 4125581 https://neotube.io/transaction/0x69fa0b479e4d318e49640f240664c20fafe87005211818b8b6e08b80294a698a

two of the only there explorers https://explorer.onegate.space/ and https://dora.coz.io/ blocked at block 4125581.

THE TRANSACTION IS A LEGAL TRANSACTION WITHOUT ANY DAMAGE

I apologize to all users and projects affected by this transaction but do understand why I have to do this:

apologize again for the sudden operation.

roman-khimov commented 12 months ago

cheers :)

That's just a fact and me trying to keep track of at least some patch version. You can blame me for 1/7 of this fact (according to https://governance.neo.org/), if you want to. But 1/7th is not enough to change facts. And not following the scheduled upgrade will soon result in consensus problems. Go figure.

vang1ong7ang commented 12 months ago

just as @shargon said:

YOU SHOULD BE CLEAR THAT IT COULD LEAD TO A FORK OR WORSE STILL, THAT A THIRD PARTY ABUSES SAID ATTACK AND YOU ARE DIRECTLY OR INDIRECTLY RESPONSIBLE FOR ITS EXPLOITATION.

remember it is mainnet which should be given 1000x attention to than testnet

Jim8y commented 12 months ago

All I can say is thank god we have @vang1ong7ang here on our side.

dusmart commented 12 months ago

by pausing the outdated nodes, those nodes won't be cheated by those malicious transactions. otherwise all the service with the outdated neo nodes may show false information

Genius. I thought that we will have to compare the token states all the way to 3.6.1 to safeguard the chain. Using soft fork to protect us with a few inconvenience is totally reasonable to me.

MAY THE SAFETY BE ALWAYS WITH US

vncoelho commented 12 months ago

For me it is unbelievable that CNs were upgrade. @vang1ong7ang warned and I replicated the attack. This is irresponsibility if there is any state that has been affected. I imagine that you did a complete verification before upgrading the node.

My friend, @vang1ong7ang, were you able to verify that? @superboyiii

vang1ong7ang commented 12 months ago

I confirmed with @superboyiii that the state verification is done, though it cannot completely ensure the safety, it is a good and important protective measurement.

however unfortunately I didn't do the verification yet.

Jim8y commented 11 months ago

@vang1ong7ang @vncoelho @dusmart I have a question, when a new block is generated and my transaction is included, how do i know whether a node has the correct execution result. How does a node know that he has the correct execution results that have the same states with other nodes. For neo of course

dusmart commented 11 months ago

compare the state root will be great

https://github.com/neo-project/neo-modules/blob/master/src/StateService/README.md

roman-khimov commented 10 months ago

I propose closing this because 3.6.0 is already an old story, the specific list is outdated. For the future versions the real underlying problem will be fixed by #1526 (and associated technologies like #2932).