Support forking from mainnet (or any target network)

janewang commented 3 months ago

What problem does your feature solve?

To be able to replicate state and issues seen from another network, or used for testing.

What would you like to see?

Be able to recreate a state from mainnet. The node could be forked from the target network from a specific block or continously syncing to the target network.

What alternatives are there?

leighmcculloch commented 3 months ago

Internal document we should deliver on the action items for:

https://docs.google.com/document/d/1Ml8ilIe7fTdnPUJuqFvPqh9YrgFHs2OuYP_UJmY__lU

leighmcculloch commented 3 months ago

Proposed requirements:

Start quickstart, core catches up to a specific ledger (any ledger, not checkpoint) then disconnects from network and quorum and shifts to an unsafe quorum with just itself.
Maintains connection with local RPC, and local Horizon, etc. Or, RPC and local Horizon are started after the fork.
G account impersonation:
- Be able to submit txs for existing mainnet G accounts without holding the signers.
- Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

Ideal requirements:

The forked network has a different network passphrase to the original network.
That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.
C account impersonation:
- Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

I think most of the work for this is adding capabilities to stellar-core, with some small work to expose those capabilities to quickstart. I don't think we could realistically implement this all in quickstart only, because there's no way to stop stellar-core at a specific ledger and starting and stopping core, swapping out config files, is likely to be brittle.

cc @anupsdf @dmkozh @janewang @tomerweller

dmkozh commented 3 months ago

G account impersonation: Be able to submit txs for existing mainnet G accounts without holding the signers. Be able to submit soroban auths for existing mainnet G accounts without holding the signers.

I'm not sure if real 'impersonation' is feasible; that seems too cumbersome and risky to maintain in Core. I think we could just disable signature verification if a certain Core config flag is set. This still seems risky, but at least is much easier to control. One can also use this mode to fund an arbitrary number of test accounts and then switch back into 'enforcement' mode (e.g. when they want to set up some sort of integration test).

The forked network has a different network passphrase to the original network.

I don't think that's a good idea; the network id defines the contract id namespace, so if we change the passphrase, then the network allow instantiating 2 SAC instances per asset and that's generally not the operation mode that we'd want to support in any capacity.

That the bulk of the fork functionality is built directly into stellar-core to make it possible to use stellar-core in isolation to connect to and fork a network.

Is the bulk of the functionality not already in the Core/has to be implemented in the Core (besides downstream service deps, that is)? I don't think we need to go beyond that - there needs to be some external orchestration and I don't think it belongs to Core.

C account impersonation: Be able to submit soroban auths for existing mainnet C accounts without executing their __check_auth logic.

Similarly to G-accounts, we could just switch host to recording auth. I wouldn't try to go for more granular control than that.

MonsieurNicolas commented 3 months ago

wrt requirements above: those seem to be solutions more than actual requirements.

Adding arbitrary overrides/hooks to core seems to be very brittle (as it's not "the real thing") and will make adding features slow (because now you need to coordinate DevX and core teams on future changes) and I don't see why devX (or others) would have to write different code depending on if they're testing against a "real core" or against some arbitrary state (be local filesystem for CLI or in the client for browser based solutions).

For background, we actually investigated some of those things as part of https://github.com/stellar/stellar-core/issues/2695 -- this was before Soroban.

Here are few things to think about:

changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC). So things like proxy contracts and SAC balances will break. It also breaks classic constructs like AMMs, CBs, etc.
auth breaking is the "canary", signature verification does not occur exclusively during auth. if people want to test interactions in the context of layer 2/bridge development, they will run into similar issues elsewhere (in some cases it's more that you need to control which role a specific address has, typically stored in some data entry or wrapped in some access token).
as you have multiple core nodes running, they all need to behave exactly the same.

I would actually try to flip this work on its head by exposing a much narrower set of functionality in core and let people outside iterate on functionality.

For example, if we were adding a special native contract (only enabled when a special flag is set) that allows to create/update/delete arbitrary ledger entries (first version, we can limit this to soroban code/data, but there could be other methods added in the future to make changes to classic entries, network settings or even TTL entries).

Note that we would still need to do something to allow people to use this contract, so maybe the special flag that enables that functionality would also reset the "network admin account" somehow (so that people can submit transactions with it). For example GAAZI4TCR3TY5OJHCTJC2A4QSY6CJWJH5IAJTGKIN2ER7LBNVKOCCWN7 on the current public network is "locked" right now and does not have a lot of XLMs see its state in lab.

With this functionality you can:

replace any contract code by anything -- so if I don't like a policy contained in a contract, I can just change it, so for example change the admin check code to "return true", or add new methods to popular contracts. The replacement could also just be a wrapper of sorts that performs some pre-post processing before invoking the "real" wasm.
replace ledger entries based on educated guesses (like if you know where a balance is stored) or by using the output of a simulation run (that bypasses all auth)

I could see the same logic built on top of this kind of functionality usable either on top of a "quickstart image" like this, or in a pure client side (browser based or cli where the "host" is not core).

leighmcculloch commented 3 months ago

👍🏻 Thanks, this is really helpful feedback.

If we went for the narrower set of functionality in core focused on supporting quickstart coordinating the forking and supporting ledger entry substitution, could we make these two changes in core?

https://github.com/stellar/stellar-core/issues/4427 so that we can start core, catch up to a specific ledger, then exit, change quorum cfg, restart core. Without this we can in theory do this with checkpoints only with the catchup command?
a new http endpoint that accepts a ledger entry which overwrites that entry before, the next ledger.
- This can be used to reset the network root account.
- This can be used to do anything that the contract @MonsieurNicolas you suggested could do, but without the need for people to build valid txs, or build contract invocations that requires rpc to simulate for costs and footprints.

With those two changes quickstart in fork mode would:

catch up core to ledger # (core stops/shutsdown itself after catch up)
change cfg of core+rpc+horizon to local core instance only quorum
start core, rpc, horizon
call core's new http endpoint to:
- reset thresholds of root account so the root's master key works again
- set balance to u32::MAX for native of root account
start friendbot (it uses the root account to issue test accounts)

Then folks can use the fork like any test network, or they can use the new http endpoint to sub any other data.

Technically it wouldn't allow you to do everything you might want to do. You might want to disable auth on a contract without subbing the entire contract and subbing ledger entries wouldn't let you do that. But I think the above would get us 80% there, and then we can add other features as needed such as recording auth like what @dmkozh suggested.

changing the network passphrase is probably not doable as it changes auth, but also all contract IDs (include SAC)

I understand the difficulty with contract IDs. It's unfortunate that we tied the IDs to the network passphrase, because it hasn't turned out to be a benefit. Could we separate the network passphrase/id concept so that a network could change it's ID for future signatures (txs, auths) while keeping it's "original ID" for contract IDs and other uses?

The risk of a tx accidentally being submitted to pubnet exists. Even though txs won't be naturally circulated to pubnet, there's a footgun opportunity that someone copies a test tx that they're developing with and pastes it into something like the Lab, then accidentally submitting it to pubnet, or runs the forked setup in a public CI environment where their private key might be secret but a signed tx is leaked and someone submits to pubnet.

tomerweller commented 3 months ago

The risk of a tx accidentally being submitted to pubnet exists.

Just want to emphasize that this is a very real foot gun if we maintain the same passphrase. Developers often jump between networks and often accidentally submit a transaction in the wrong network (happens to me all the time). If we promote a flow in which local debug transactions are valid on mainnet someone will accidentally submit them on mainnet.

MonsieurNicolas commented 2 months ago

yeah the passphrase issue is quite annoying -- changing it "partially" would require adopting this partial switch all over SDKs etc (ie: SDKs today compute the SAC address for example, so they would need to know about this split and only use the new ID when signing payloads).

Going back to what we're trying to do here: do we really need to fork an entire network's state? What about the original requirement of "continuously syncing to the network"?

Could this work be instead be rescoped to just "import and transform" (that can be extended as much as needed with contract specific transforms): I imagine that the list of entries to import is actually small (and simple to generate) and transforms (like compute different hashes) are also fairly simple to do.

With this paradigm:

"forking" a network is a matter of seconds, even on top of pubnet (that would normally require downloading GBs of data). So even "rebasing" some changes on top of the latest network state should be doable + as the overhead is very low, a fork can be run everywhere (laptop, browser, etc)
separate passphrase -> no risk of signing something valid on existing networks
no need to deal with archived state (something nobody mentioned so far)

to make this work, I think the only core/platform change needed would be to support taking as an argument a file that contains the genesis ledger + its ledger header.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

stellar / quickstart