Dynamic environment with probabilistic termination implemented in Rust

pkel commented 6 months ago

My original approach was to

implement a (discrete event) network simulator
implement the consensus protocol
implement a generic attacker for each protocol
expose control of attacker in simulated network against protocol as Python/OpenAI gym
run RL to find attacks

Recently, I came up with simpler approach. See #46 and https://arxiv.org/abs/2309.11924

adopt network assumption of original selfish mining paper
come up with protocol-independent attacker model; it only ignores and withholds information but acts 'honestly' otherwise.
implement honest behaviour for each protocol to be analysed
derive MDPs from mostly naive state space exploration
transform the MDPs into probabilistically terminating MDPs and solve

Now, the MDPs grow too big and my python implementation is too slow, which is kind of expected.

In this PR I want to explore a hybrid approach, taking the best ideas from both worlds:

adopt network assumption of original selfish mining paper
use existing protocol-independent attacker model
use existing protocol specifications
bake probabilistic termination into the model
apply different solvers a. simulate and expose as OpenAI gym, do RL, or b. derive tabular MDP from exhaustive state space exploration

The simulator-based gym has a problem not being Markovian. It uses an internal (non-observable) progress counter which triggers termination. This might cause problems in the RL.

With the new apporach, we should obtain an implicit MDP, maybe POMDP, from step 4. Due to probabilistic termination new model is Markovian. It's also much simpler than simulating individual messages. And it's easier to compare it to existing results.

I'm going for path 5a now, first putting MDP generation aside. Maybe I can keep an eye on it for future code reuse. Simulation is much simpler than exploration.

As this requires rewriting most of #46, I start from scratch. I use Rust with pyo3, because I want to learn rust anyway.

pkel commented 6 months ago

I've implemented a first draft in December 2023. It's using the Bitcoin-only model described at FC '16.

I think the next step is to implement the generic model of https://arxiv.org/abs/2309.11924.

pkel commented 4 months ago

This seems to work now, but I feel no motivation to apply modern RL algorithms to this environment. Implementing this produced some valuable insights though. Tabular methods must suffice. The ideas live on in #49.

pkel / cpr

Dynamic environment with probabilistic termination implemented in Rust #48