sakakibara-yuuki / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
MIT License
0 stars 0 forks source link

CQL example #5

Open sakakibara-yuuki opened 10 months ago

sakakibara-yuuki commented 10 months ago

cqlのexampleを触って、 https://github.com/sakakibara-yuuki/rl/blob/main/examples/cql/cql_offline.py

/home/sakakibara/project/cql/cql_offline.py:29: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path=".", config_name="offline_config")
/home/sakakibara/.pyenv/versions/3.9.13/lib/python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.

hydraのversionが古いのでcql exampleを書き直す必要あり

sakakibara-yuuki commented 10 months ago

version_base引数が必要のよう

また、 make_environmentで以下のwarningが発生。直したほうが良い

/home/sakakibara/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gymnasium/core.py:311: UserWarning: WARN: env.reward_space to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.reward_space` for environment variables or `env.get_wrapper_attr('reward_space')` that will search the reminding wrappers
sakakibara-yuuki commented 10 months ago

loss = CQLLossを使用する際に loss(data) のdataには

data = TensorDict({
        "observation": torch.randn(*batch, n_obs),
         "action": action,
        ("next", "done"): torch.zeros(*batch, 1, dtype=torch.bool),
        ("next", "reward"): torch.randn(*batch, 1),
        ("next", "observation"): torch.randn(*batch, n_obs),
     }, batch)

のように暗黙的にnextというin_keysが必要.

tutorial にそう書いてある。

sakakibara-yuuki commented 10 months ago

CQLLossの実装はDQNのtutorialに強く影響を受けているのではないか

sakakibara-yuuki commented 10 months ago

tensordictについてもよく読むこと