pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.25k stars 297 forks source link

[Feature] Add `OpenSpielWrapper` and `OpenSpielEnv` #2345

Closed kurtamohler closed 1 month ago

kurtamohler commented 2 months ago

Description

Adds environment wrapper classes for OpenSpiel.

OpenSpielWrapper.reset supports resetting to a specified state.

Motivation and Context

Part of #2133

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Checklist

Go over all the following points, and put an x in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!

pytorch-bot[bot] commented 2 months ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2345

Note: Links to docs will display an error until the docs builds have been completed.

:x: 3 New Failures, 7 Unrelated Failures

As of commit 1c90d04e66be04668beeaa2e7415b3643fea0fb9 with merge base e82a69f5af94cc936c4b872fd2ed499ed33b4f8e (image):

NEW FAILURES - The following jobs have failed:

* [Habitat Tests on Linux / tests (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2345#29020584169) ([gh](https://github.com/pytorch/rl/actions/runs/10478069966/job/29020584169)) `RuntimeError: Command docker exec -t 0ebcf6625dffb13eaf98fed4e3b81ab80ddd590a16eefa21b00697c196510f58 /exec failed with exit code 139` * [Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2345#29573828612) ([gh](https://github.com/pytorch/rl/actions/runs/10478069968/job/29573828612)) `RuntimeError: Command docker exec -t fb62162595af9a039c6fae565f9fb1c8e44e791df21e76ed1e7852856acd0fcc /exec failed with exit code 1` * [Libs Tests on Linux / unittests-robohive (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2345#29573829721) ([gh](https://github.com/pytorch/rl/actions/runs/10478069968/job/29573829721)) `test/test_libs.py::TestRoboHive::test_robohive[franka_slide_random-v3-True-True]`

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

* [Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda11_8](https://hud.pytorch.org/pr/pytorch/rl/2345#29034910058) ([gh](https://github.com/pytorch/rl/actions/runs/10478070001/job/29034910058)) (detected as infra flaky with no log or failing log classifier) * [Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda12_1](https://hud.pytorch.org/pr/pytorch/rl/2345#29034910206) ([gh](https://github.com/pytorch/rl/actions/runs/10478070001/job/29034910206)) (detected as infra flaky with no log or failing log classifier) * [Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda12_4](https://hud.pytorch.org/pr/pytorch/rl/2345#29034910348) ([gh](https://github.com/pytorch/rl/actions/runs/10478070001/job/29034910348)) (detected as infra flaky with no log or failing log classifier) * [Continuous Benchmark (PR) / CPU Pytest benchmark](https://hud.pytorch.org/pr/pytorch/rl/2345#29020581867) ([gh](https://github.com/pytorch/rl/actions/runs/10478069953/job/29020581867)) (detected as infra flaky with no log or failing log classifier) * [Continuous Benchmark (PR) / GPU Pytest benchmark](https://hud.pytorch.org/pr/pytorch/rl/2345#29020582558) ([gh](https://github.com/pytorch/rl/actions/runs/10478069953/job/29020582558)) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

* [Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cpu](https://hud.pytorch.org/pr/pytorch/rl/2345#29034909895) ([gh](https://github.com/pytorch/rl/actions/runs/10478070001/job/29034909895)) ([trunk failure](https://hud.pytorch.org/pytorch/rl/commit/e82a69f5af94cc936c4b872fd2ed499ed33b4f8e#28752759909)) * [Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2345#29020588518) ([gh](https://github.com/pytorch/rl/actions/runs/10478069961/job/29020588518)) ([trunk failure](https://hud.pytorch.org/pytorch/rl/commit/e82a69f5af94cc936c4b872fd2ed499ed33b4f8e#28740460021)) `test/test_transforms.py::TestKLRewardTransform::test_kl_lstm`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kurtamohler commented 2 months ago

A few notes:

categorical_action_encoding=False is not supported yet.

Also, some of the games in OpenSpiel do not work properly in OpenSpielWrapper because the action spec assumes a discrete space of pyspiel.Game.num_distinct_actions() (see OpenSpiel API reference). However, for some of the games, pyspiel.State.legal_actions() can return more actions than pyspiel.Game.num_distinct_actions(). I suppose to support those games we need to allow the action spec's size to change at each step?

kurtamohler commented 1 month ago

At the moment, this only supports games where all actions are taken by the players, like in chess or tic-tac-toe. But I've realized that OpenSpiel also has a concept of chance nodes, where a random non-player action is taken. For instance, in Kuhn poker, the initial dealing of the cards is a chance node. In liar's dice, the outcome of rolling dice is a chance node. OpenSpiel has some methods to obtain all the possible chance actions and the associated probability distribution (shown in this example).

For now, I'll raise an error if a loaded game contains chance nodes and leave it as a TODO. I might wait to add support for it in a follow-up PR--unless you would prefer for me to add it in this PR.

vmoens commented 1 month ago

I suppose to support those games we need to allow the action spec's size to change at each step?

Yes I think having a dynamic space would be the way to go. See #2143

vmoens commented 1 month ago

OpenSpiel has some methods to obtain all the possible chance actions and the associated probability distribution (shown in this example).

That's an amazing feature. Happy to integrate it separately!

kurtamohler commented 1 month ago

I suppose to support those games we need to allow the action spec's size to change at each step?

Yes I think having a dynamic space would be the way to go. See #2143

Actually, I'm not so sure that we would need dynamic action specs after all. It is true that pyspiel.State.legal_actions() can return a different number of actions than pyspiel.Game.num_distinct_actions(), but I'm pretty sure (not 100% sure) that only happens when it's a chance node, in which case the length of legal_actions() is pyspiel.Game.max_chance_outcomes() instead. Once I add support for chance nodes, it will have its own action spec separate from the players' action specs, and I think all action specs will maintain the same shape throughout the game.

kurtamohler commented 1 month ago

I think I've addressed everything that needed to be fixed so far. Let me know if there is anything else