Closed VincentLiu3 closed 1 year ago
Probably, you're looking for this? https://github.com/takuseno/d3rlpy/blob/master/reproductions/offline/discrete_cql.py
Note that I only benchmarked 1% data settings.
I see. Thank you.
Is there any way I can train the model with all datasets. For example, can I sample 1% data and train for K steps and sample another 1% data and train for K steps. Does the current code support training with different dataset?
All datasets mean using all data of 1 dataset? Using my notation above, it means 100%? In that case, you can simply specify 1.0
at fraction
in this line:
https://github.com/takuseno/d3rlpy/blob/361510571af5dcaf9da41986eedba6f077d401cf/reproductions/offline/discrete_cql.py#L17
Does the current code support training with different dataset?
I'm not sure I follow this. d3rlpy is designed to be agnostic to datasets. You can use your own datasets.
Sorry for the confusion. by all dataset, I meant all 50 replay dataset (which is used in Agarwal et al., 2020). I wanted to reproduce the results in Agarwal et al. (2020), instead of the results using only 1M dataset.
However, the issue is that I cannot load all 50 dataset at once. I was trying to do something like
For i in range(M):
load 1 dataset randomly
train the model for K steps using the dataset
Does the current code base support something like this? I was trying to call cql.fit() with different dataset, however, I was not sure if it was doing what I wanted (i was not sure if the model was re-initialized every time I call fit). Thank you!
You can do it like this:
import numpy as np
import d3rlpy
NUM_DATASETS = 50
cql = d3rlpy.algos.DiscreteCQLConfig(
learning_rate=5e-5,
optim_factory=d3rlpy.models.optimizers.AdamFactory(eps=1e-2 / 32),
batch_size=32,
alpha=4.0,
q_func_factory=d3rlpy.models.q_functions.QRQFunctionFactory(
n_quantiles=200
),
observation_scaler=d3rlpy.preprocessing.PixelObservationScaler(),
target_update_interval=2000,
reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0),
).create(device=None)
for epoch in range(1000):
dataset_epoch = np.random.randint(NUM_DATASETS)
dataset, env = d3rlpy.datasets.get_atari(f"breakout-epoch-{dataset_epoch}-v0", num_stack=4)
env_scorer = d3rlpy.metrics.EnvironmentEvaluator(env, epsilon=0.001)
cql.fit(
dataset,
n_steps=2000000 // 4,
n_steps_per_epoch=125000,
evaluators={"environment": env_scorer},
experiment_name=f"DiscreteCQL_breakout_{epoch}",
)
I see. Thank you so much!
I am wondering how did you get the Atari results using d3rlpy. It seems to me that d3rlpy does not support training with all 50 dataset because the dataset is too large. Can you provide the detail on what dataset was used to generate the results, or the scripts you used to generate the results? Thank you!