takuseno / d3rlpy-benchmarks

Benchmark data for d3rlpy
MIT License
20 stars 5 forks source link

Atari dataset for the benchmark results #5

Closed VincentLiu3 closed 1 year ago

VincentLiu3 commented 1 year ago

I am wondering how did you get the Atari results using d3rlpy. It seems to me that d3rlpy does not support training with all 50 dataset because the dataset is too large. Can you provide the detail on what dataset was used to generate the results, or the scripts you used to generate the results? Thank you!

takuseno commented 1 year ago

Probably, you're looking for this? https://github.com/takuseno/d3rlpy/blob/master/reproductions/offline/discrete_cql.py

Note that I only benchmarked 1% data settings.

VincentLiu3 commented 1 year ago

I see. Thank you.

Is there any way I can train the model with all datasets. For example, can I sample 1% data and train for K steps and sample another 1% data and train for K steps. Does the current code support training with different dataset?

takuseno commented 1 year ago

All datasets mean using all data of 1 dataset? Using my notation above, it means 100%? In that case, you can simply specify 1.0 at fraction in this line: https://github.com/takuseno/d3rlpy/blob/361510571af5dcaf9da41986eedba6f077d401cf/reproductions/offline/discrete_cql.py#L17

Does the current code support training with different dataset?

I'm not sure I follow this. d3rlpy is designed to be agnostic to datasets. You can use your own datasets.

VincentLiu3 commented 1 year ago

Sorry for the confusion. by all dataset, I meant all 50 replay dataset (which is used in Agarwal et al., 2020). I wanted to reproduce the results in Agarwal et al. (2020), instead of the results using only 1M dataset.

However, the issue is that I cannot load all 50 dataset at once. I was trying to do something like

For i in range(M):
     load 1 dataset randomly
     train the model for K steps using the dataset

Does the current code base support something like this? I was trying to call cql.fit() with different dataset, however, I was not sure if it was doing what I wanted (i was not sure if the model was re-initialized every time I call fit). Thank you!

takuseno commented 1 year ago

You can do it like this:

import numpy as np
import d3rlpy

NUM_DATASETS = 50

cql = d3rlpy.algos.DiscreteCQLConfig(
    learning_rate=5e-5,
    optim_factory=d3rlpy.models.optimizers.AdamFactory(eps=1e-2 / 32),
    batch_size=32,
    alpha=4.0,
    q_func_factory=d3rlpy.models.q_functions.QRQFunctionFactory(
        n_quantiles=200
    ),
    observation_scaler=d3rlpy.preprocessing.PixelObservationScaler(),
    target_update_interval=2000,
    reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0),
).create(device=None)

for epoch in range(1000):
    dataset_epoch = np.random.randint(NUM_DATASETS)
    dataset, env = d3rlpy.datasets.get_atari(f"breakout-epoch-{dataset_epoch}-v0", num_stack=4)

    env_scorer = d3rlpy.metrics.EnvironmentEvaluator(env, epsilon=0.001)

    cql.fit(
        dataset,
        n_steps=2000000 // 4,
        n_steps_per_epoch=125000,
        evaluators={"environment": env_scorer},
        experiment_name=f"DiscreteCQL_breakout_{epoch}",
    )
VincentLiu3 commented 1 year ago

I see. Thank you so much!