takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.33k stars 244 forks source link

[BUG] Setting parameters in CQL does not work #205

Closed HYDesmondLiu closed 2 years ago

HYDesmondLiu commented 2 years ago

Describe the bug Even though some parameters are set, they are not fed into the CQL algorithm.

To Reproduce

import d3rlpy
import gym 
import d4rl

dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v2")
cql = d3rlpy.algos.CQL(use_gpu = True,)

def eval_policy(policy):
    actions = cql.predict(x)

for t in range(1000000):    
    cql.fit(dataset, n_steps=1, n_steps_per_epoch=1)
    if (t + 1) % args.eval_freq == 0:
        eval_policy(cql)

Expected behavior Only run one step of training, instead, it runs 1953 steps, I wonder where this number comes from??? Use GPU is set to be True, but it was not running with GPU......


2022-08-08 12:08.07 [debug    ] RoundIterator is selected.
2022-08-08 12:08.07 [info     ] Directory is created at d3rlpy_logs/CQL_20220808120807
2022-08-08 12:08.07 [debug    ] Building models...
2022-08-08 12:08.08 [debug    ] Models have been built.
2022-08-08 12:08.08 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220808120807/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_learning_rate': 0.0003, 'critic_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'gamma': 0.99, 'generated_maxlen': 100000, 'initial_alpha': 1.0, 'initial_temperature': 1.0, 'n_action_samples': 10, 'n_critics': 2, 'n_frames': 1, 'n_steps': 1, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'soft_q_backup': False, 'tau': 0.005, 'temp_learning_rate': 0.0001, 'temp_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'use_gpu': 0, 'algorithm': 'CQL', 'observation_shape': (20,), 'action_size': 2}
Epoch 1/1: 100%|█████████████████| 1953/1953 [01:32<00:00, 21.17it/s, temp_loss=-1.1, temp=1.07, alpha_loss=13.6, alpha=0.913, critic_loss=-10.9, actor_loss=4.73]
2022-08-08 12:09.41 [info     ] CQL_20220808120807: epoch=1 step=1953 epoch=1 metrics={'time_sample_batch': 0.0028074814366244438, 'time_algorithm_update': 0.04355938167249735, 'temp_loss': -1.0992966500352697, 'temp': 1.0667595759881074, 'alpha_loss': 13.566169590478943, 'alpha': 0.9133139546870941, 'critic_loss': -10.887224008487056, 'actor_loss': 4.734693383637776, 'time_step': 0.046753879699472645} step=1953
2022-08-08 12:09.41 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220808120807/model_1953.pt

Additional context Ubuntu 18.04 d3rlpy==1.1.1 Python 3.9.12

HYDesmondLiu commented 2 years ago

Solved on my own.