hakuhodo-technologies / scope-rl

SCOPE-RL: A python library for offline reinforcement learning, off-policy evaluation, and selection
https://scope-rl.readthedocs.io/en/latest/
Apache License 2.0
106 stars 10 forks source link

Is Evaluation-of-OPE possible with real-world data? #30

Open pmoran3 opened 3 months ago

pmoran3 commented 3 months ago

Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?

When I run this code block from the basic_synthetic_continious_advanced.ipynb notebook on my real-world dataset

ops = OffPolicySelection(
    ope=ope,
    cumulative_distribution_ope=cd_ope,
)
ops.obtain_true_selection_result(
    input_dict=input_dict,
    return_variance=True,
    return_lower_quartile=True,
    return_conditional_value_at_risk=True,
    return_by_dataframe=True,
)

I get the following error: ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict.

Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.