wessle / costaware

Repository for cost-aware project code.
MIT License
2 stars 0 forks source link

Seeding RandomMDPEnv #24

Open wessle opened 3 years ago

wessle commented 3 years ago

Issue

We need to be able to seed RandomMDPEnv so that, whenever identical seeds are provided, identical MDPEnvs are produced.

Question

Is this already possible with the current class definition?

wessle commented 3 years ago

Doing

from itertools import product
envs = (RandomMDPEnv(10, 10, 'r1', 'c1', transition_seed=1066) for _ in range(10))
Ps = (env.transition_probabilities for env in envs)
P_pairs = product(Ps, Ps)
pairs = product(envs[0].states, envs[0].actions)
all(all(P[0](*pair) == P[1](*pair)) for pair in pairs for P in P_pairs)

returns True, so I think setting transition_seed suffices to guarantee that RandomMDPEnv will always return the same environment.

New Issue

The keyword argument training_seed in RandomMDPEnv appears to be superfluous, since an agent responsible for training should be making the appropriate call to np.seed anyway.

@DavidNKraemer Can we remove training_seed?

DavidNKraemer commented 3 years ago

I would be happy to get rid of the training_seed. I think setting the seed in one place is the best option by far, and have all the downstream consequences flow from that.

wessle commented 3 years ago

Great, and agreed.