pkel / cpr

consensus protocol research
8 stars 2 forks source link

Dynamic environment with probabilistic termination implemented in Rust #48

Closed pkel closed 4 months ago

pkel commented 6 months ago

My original approach was to

  1. implement a (discrete event) network simulator
  2. implement the consensus protocol
  3. implement a generic attacker for each protocol
  4. expose control of attacker in simulated network against protocol as Python/OpenAI gym
  5. run RL to find attacks

Recently, I came up with simpler approach. See #46 and https://arxiv.org/abs/2309.11924

  1. adopt network assumption of original selfish mining paper
  2. come up with protocol-independent attacker model; it only ignores and withholds information but acts 'honestly' otherwise.
  3. implement honest behaviour for each protocol to be analysed
  4. derive MDPs from mostly naive state space exploration
  5. transform the MDPs into probabilistically terminating MDPs and solve

Now, the MDPs grow too big and my python implementation is too slow, which is kind of expected.

In this PR I want to explore a hybrid approach, taking the best ideas from both worlds:

  1. adopt network assumption of original selfish mining paper
  2. use existing protocol-independent attacker model
  3. use existing protocol specifications
  4. bake probabilistic termination into the model
  5. apply different solvers a. simulate and expose as OpenAI gym, do RL, or b. derive tabular MDP from exhaustive state space exploration

The simulator-based gym has a problem not being Markovian. It uses an internal (non-observable) progress counter which triggers termination. This might cause problems in the RL.

With the new apporach, we should obtain an implicit MDP, maybe POMDP, from step 4. Due to probabilistic termination new model is Markovian. It's also much simpler than simulating individual messages. And it's easier to compare it to existing results.

I'm going for path 5a now, first putting MDP generation aside. Maybe I can keep an eye on it for future code reuse. Simulation is much simpler than exploration.

As this requires rewriting most of #46, I start from scratch. I use Rust with pyo3, because I want to learn rust anyway.

pkel commented 6 months ago

I've implemented a first draft in December 2023. It's using the Bitcoin-only model described at FC '16.

I think the next step is to implement the generic model of https://arxiv.org/abs/2309.11924.

pkel commented 4 months ago

This seems to work now, but I feel no motivation to apply modern RL algorithms to this environment. Implementing this produced some valuable insights though. Tabular methods must suffice. The ideas live on in #49.