microsoft / learning-loop

Joiner and Trainer for online reinforcement learning loop
MIT License
3 stars 2 forks source link

Basic documentation #13

Open GeorgeS2019 opened 2 weeks ago

GeorgeS2019 commented 2 weeks ago

Curious why this serve RL and how is this RL related to known RL framework e.g. SB3

rajan-chari commented 2 weeks ago

This project is based on https://github.com/VowpalWabbit/vowpal_wabbit

GeorgeS2019 commented 2 weeks ago

Contextual Bandit algorithms

Logged Contextual Bandit Example

The users here would like to know why this particular type of RL is chosen with respect to the typical use cases of learning targeted by this project, beside nicely design APIs from vowpal_wabbit