Implement BanditLOLS - Githubissues

there's some super-ugliness in BanditLOLS/LinearPolicy that I'd like to get your take on (see lols.py:55,72 and init.py:82-85). the issue is that in order to do CS bLOLS, you need to remember the predicted costs at deviation time, so that you can set the cost vector and setup the regression problem at the end after you observe the reward. the current approach is to split this, but that's ugly. another option might be for LinearPolicy to provide something that returns a continuation? any other ideas?

hal3 / macarico

Implement BanditLOLS #6