ValueError: too many values to unpack (expected 4) when using hopper-medium-v0 environment

sky-story commented 3 months ago

Hi，it is me again (hhh When I run the example like this：

import d3rlpy

# prepare dataset
dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')

# prepare algorithm
cql = d3rlpy.algos.CQLConfig().create(device='cuda:0')

# train
cql.fit(
    dataset,
    n_steps=100000,
    evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

I find that the step function in the d4rl environment returns five values instead of the traditional four, and the detailed message is following：

Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
Warning: GymBullet failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this messag                                                              e.
No module named 'pybullet_envs'
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/envs/registration.py:555: UserWarning: WARN: The env                                                              ironment hopper-medium-v0 is out of date. You should consider upgrading to version `v2`.
  logger.warn(
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py:190: UserWarning: WARN: Th                                                              is version of the mujoco environments depends on the mujoco-py bindings, which are no longer maintained and may stop w                                                              orking. Please upgrade to the v4 versions of the environments (which depend on the mujoco python bindings instead), un                                                              less you are trying to precisely replicate previous works).
  logger.warn(
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d4rl/gym_mujoco/gym_envs.py:13: UserWarning: This enviro                                                              nment is deprecated. Please use the most recent version of this environment.
  offline_env.OfflineEnv.__init__(self, **kwargs)
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound prec                                                              ision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
load datafile: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12.77it/s]
2024-05-19 18:41.08 [info     ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype                                                              ('float32')], shape=[(3,)]) observation_signature=Signature(dtype=[dtype('float32')], shape=[(11,)]) reward_signature=                                                              Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-05-19 18:41.08 [info     ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS:                                                               1>
2024-05-19 18:41.08 [info     ] Action size has been automatically determined. action_size=3
2024-05-19 18:41.09 [info     ] dataset info                   dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(11,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=3)
2024-05-19 18:41.09 [info     ] Directory is created at d3rlpy_logs/CQL_20240519184109
2024-05-19 18:41.09 [debug    ] Building models...
2024-05-19 18:41.10 [debug    ] Models have been built.
2024-05-19 18:41.10 [info     ] Parameters                     params={'observation_shape': [11], 'action_size': 3, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.0001, 'critic_learning_rate': 0.0003, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.0001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False, 'max_q_backup': False}}}
Epoch 1/10: 100%|██████████████████████| 10000/10000 [03:08<00:00, 53.09it/s, critic_loss=-39.5, conservative_loss=-42.8, alpha=0.638, actor_loss=-68.7, temp=0.673, temp_loss=2.04]
Traceback (most recent call last):
  File "my_cql.py", line 10, in <module>
    cql.fit(
  File "/root/d3rlpy/d3rlpy/algos/qlearning/base.py", line 422, in fit
    results = list(
  File "/root/d3rlpy/d3rlpy/algos/qlearning/base.py", line 588, in fitter
    test_score = evaluator(self, dataset)
  File "/root/d3rlpy/d3rlpy/metrics/evaluators.py", line 544, in __call__
    return evaluate_qlearning_with_environment(
  File "/root/d3rlpy/d3rlpy/metrics/utility.py", line 65, in evaluate_qlearning_with_environment
    observation, reward, done, truncated, _ = env.step(action)
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 50, in step
    observation, reward, terminated, truncated, info = self.env.step(action)
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d4rl/utils/wrappers.py", line 165, in step
    next_obs, reward, done, info = wrapped_step
ValueError: too many values to unpack (expected 4)

Can you tell me how to fix this? Thank you!

takuseno commented 3 months ago

I think you're using Farama's D4RL package in your experiment. Please try this:

$ pip uninstall D4RL
$ d3rlpy install d4rl

In this way, d3rlpy will install my fork D4RL package from https://github.com/takuseno/D4RL , which fixes some of incompatibilities.

sky-story commented 3 months ago

I think you're using Farama's D4RL package in your experiment. Please try this:
$ pip uninstall D4RL
$ d3rlpy install d4rl
In this way, d3rlpy will install my fork D4RL package from https://github.com/takuseno/D4RL , which fixes some of incompatibilities.

Thanks for the suggestion！I originally installed the D4RL package using the following commands:

pip install d3rlpy
pip install git+https://github.com/Farama-Foundation/D4RL
pip install -U gym
pip uninstall pybullet

However, it appears that this setup defaults to Farama's D4RL package. I'll give your approach a try using d3rlpy install d4rl By the way, it seems that the command d3rlpy install d4rl requires python 3.9？

takuseno commented 3 months ago

By the way, it seems that the command d3rlpy install d4rl requires python 3.9？

I don't think so. Did you see any errors?

sky-story commented 3 months ago

By the way, it seems that the command d3rlpy install d4rl requires python 3.9？

I don't think so. Did you see any errors?

@takuseno Sorry for the delayed response. Yes, I have some evidence that this issue exists. I am using Python 3.8, and when I run the command to install d4rl, I get the following error：

(drl_project) root@autodl-container-362e44a99f-d27d57c6:~# d3rlpy install d4rl
Traceback (most recent call last):
  File "/root/miniconda3/envs/drl_project/bin/d3rlpy", line 5, in <module>
    from d3rlpy.cli import cli
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d3rlpy/cli.py", line 352, in <module>
    name: list[str], upgrade: bool = False, check: bool = True
TypeError: 'type' object is not subscriptable

I suspect it is because type annotations like list[str] are only supported in Python 3.9 and above.

takuseno commented 3 months ago

Thanks for following up on this! Yeah, you're right. In the latest commit, I've updated these lines: https://github.com/takuseno/d3rlpy/commit/18d710a9306c48b3cee63f22c20e0e67aff16020

If you install d3rlpy from source, it should work with Python 3.8.

sky-story commented 3 months ago

That's great！I'll close this issue.

takuseno / d3rlpy

ValueError: too many values to unpack (expected 4) when using hopper-medium-v0 environment #395