Closed omar-emara closed 11 months ago
@lucyfarnik I think we could start with something simple like behavior cloning policy as a baseline for comparing our more advanced RL methods to. There is a good implementation here https://imitation.readthedocs.io/en/latest/tutorials/1_train_bc.html
We could also include a user interface to develop our own custom policy and replace the expert variable in the example in the documentation above. @vishaljoshi24 would need to add this in the UI if we decided to go down this route.
So, Jonathan and I will sit down in the meeting tomorrow to brainstorm ideas about how the app will look like. Sketch a few drafts and then we can start implementing it on streamlit asap.
@lucyfarnik https://towardsdatascience.com/xrl-explainable-reinforcement-learning-4cd065cdec9a
Above is a good article summarizing three papers for XRL. Thought would be useful to share.
This issue has been opened following the team meeting this week and the request for help with the following: