Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the basic_synthetic_continious_advanced.ipynb notebook on my real-world dataset
I get the following error: ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict.
Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.
Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the
basic_synthetic_continious_advanced.ipynb
notebook on my real-world datasetI get the following error:
ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict
.Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.