Open ericyue opened 9 months ago
Hi @ericyue,
Thank you for reaching out. If you are interested in only learning a policy (i.e., not aiming at doing off-policy evaluation (OPE)), you can use the following transformation of the logged data:
offlinerl_dataset = MDPDataset(
observations=train_logged_dataset["state"],
actions=train_logged_dataset["action"],
rewards=train_logged_dataset["reward"],
terminals=train_logged_dataset["done"],
)
(See also: https://scope-rl.readthedocs.io/en/latest/documentation/quickstart.html)
This does not require "score", so it should work on your dataset.
@aiueola thanks for replying! I want to do OPE too, I meet the same error here, can you give some help for this? https://github.com/hakuhodo-technologies/scope-rl/issues/25
Could you provide a more detail jupyter notebook about how to load a custom logged data (without pscore) to train a BCQ (or others) model? it will be very helpful!