[QUESTION] difference between evaluators and OPE ?

ericyue commented 7 months ago

I'm build a offline RL model with custom collected logs data. I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics. considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?

model = d3rlpy.algos.BCQConfig(xxxx) ret = model.fit( train_dataset, n_steps=N_STEPS, n_steps_per_epoch=N_STEPS_PER_EPOCH, logger_adapter=logger_adapter, save_interval = 10, evaluators={ 'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes), 'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes), "test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes), } )

takuseno commented 7 months ago

@ericyue Hi, thanks for the issue. I would redirect you to the following papers since this is a general offline RL question.

LorenzoBottaccioli commented 5 months ago

@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?

takuseno / d3rlpy

[QUESTION] difference between evaluators and OPE ? #380