hal3 / macarico

learning to search in pytorch
MIT License
111 stars 12 forks source link

Support nondeterministic ref #10

Open timvieira opened 7 years ago

timvieira commented 7 years ago

How does the Environment deal with getting a list of actions instead of just one.

The problem is that want the policy to break the tie.