takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.29k stars 229 forks source link

Question regarding the MOPO algorithm #95

Open ssimonc opened 3 years ago

ssimonc commented 3 years ago

Hi @takuseno, First of all, thanks for the great work.

I've a question regarding the MOPO algorithm, specifically about the ProbabilisticEnsembleDynamics.

In the original paper, authors state:

Across all domains, we train an ensemble of 7 models and pick the best 5 models based on their prediction error on a hold-out set of 1000 transitions in the offline dataset. Each of the model in the ensemble is parametrized as a 4-layer feedforward neural network with 200 hidden units and after the last hidden layer, the model outputs the mean and variance using a two-head architecture. Spectral normalization is applied to all layers except the head that outputs the model variance.

In order to reproduce the paper, starting from your example in the doc:

from d3rlpy.datasets import get_pendulum
from d3rlpy.dynamics import ProbabilisticEnsembleDynamics
from d3rlpy.metrics.scorer import dynamics_observation_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_reward_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_prediction_variance_scorer
from sklearn.model_selection import train_test_split

dataset, _ = get_pendulum()

train_episodes, test_episodes = train_test_split(dataset)

dynamics = d3rlpy.dynamics.ProbabilisticEnsembleDynamics(learning_rate=1e-4, use_gpu=True)

# same as algorithms
dynamics.fit(train_episodes,
             eval_episodes=test_episodes,
             n_epochs=100,
             scorers={
                'observation_error': dynamics_observation_prediction_error_scorer,
                'reward_error': dynamics_reward_prediction_error_scorer,
                'variance': dynamics_prediction_variance_scorer,
             })

from d3rlpy.algos import MOPO

# load trained dynamics model
dynamics = ProbabilisticEnsembleDynamics.from_json('<path-to-params.json>/params.json')
dynamics.load_model('<path-to-model>/model_xx.pt')

# give mopo as generator argument.
mopo = MOPO(dynamics=dynamics)

Am I missing something or can this be a feature to work on?

takuseno commented 3 years ago

@ssimonc Sorry for the late reply. The model picking feature is not implemented yet. Currently, the main focus is reproducing the all model-free algorithms for paper publication. After this, we're gonna spend more time on the model-based algorithms. Sorry for the inconvenience.