Open GeorgeS2019 opened 2 weeks ago
This project is based on https://github.com/VowpalWabbit/vowpal_wabbit
The users here would like to know why this particular type of RL is chosen with respect to the typical use cases of learning targeted by this project, beside nicely design APIs from vowpal_wabbit
Curious why this serve RL and how is this RL related to known RL framework e.g. SB3