Open ericyue opened 7 months ago
@ericyue Hi, thanks for the issue. I would redirect you to the following papers since this is a general offline RL question.
@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?
I'm build a offline RL model with custom collected logs data. I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics. considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?
model = d3rlpy.algos.BCQConfig(xxxx) ret = model.fit( train_dataset, n_steps=N_STEPS, n_steps_per_epoch=N_STEPS_PER_EPOCH, logger_adapter=logger_adapter, save_interval = 10, evaluators={ 'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes), 'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes), "test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes), } )