pkel / cpr

consensus protocol research
9 stars 2 forks source link

Register more useful gyms #27

Closed pkel closed 1 year ago

pkel commented 1 year ago

This PR adds a generic env_fn() for creating sensible Gym environments from serializable input. Depending on the inputs, the function applies different wrappers automatically. It then registers this function as cpr-v0 Gym environment. Unlike the old core-v0 environment, cpr-v0 can be used with external tools like ray[rllib] and rl-zoo3.

This PR also adds shortcuts to relevant pre-configured protocols.

The new envs automatically set the number of defenders depending on alpha and gamma.

Examples

gym.make("cpr_gym:cpr-v0", protocol="nakamoto", alpha=0.45, gamma=0.5) # default
gym.make("cpr_gym:cpr-nakamoto-v0") # shortcut for protocol=nakamoto
gym.make("cpr_gym:cpr-v0", alpha=lambda: random.uniform(0, 0.5), gamma=[0, 0.5, 0.9])
gym.make("cpr_gym:cpr-v0", reward="sparse_relative") # default
gym.make("cpr_gym:cpr-v0", reward="sparse_per_progress")
gym.make("cpr_gym:cpr-v0", reward="dense_per_progress")
gym.make("cpr_gym:cpr-v0", normalize_reward=True) # default; devide rewards by alpha
gym.make("cpr_gym:cpr-v0", protocol="tailstorm", protocol_args=dict(k=8, reward='discount', subblock_selection='heuristic'), reward="sparse_per_progress")
gym.make("cpr_gym:cpr-tailstorm-v0") # shortcut for previous line
gym.make("cpr_gym:cpr-tailstorm-v0", protocol_args=dict(reward='constant')) # inherits k=8 and heuristic sb-selection