hal3 / macarico

learning to search in pytorch
MIT License
111 stars 12 forks source link

Implement BanditLOLS #6

Open timvieira opened 7 years ago

hal3 commented 7 years ago

basic implementation is done in https://github.com/hal3/macarico/blob/master/macarico/lts/lols.py

hal3 commented 7 years ago

there's some super-ugliness in BanditLOLS/LinearPolicy that I'd like to get your take on (see lols.py:55,72 and init.py:82-85). the issue is that in order to do CS bLOLS, you need to remember the predicted costs at deviation time, so that you can set the cost vector and setup the regression problem at the end after you observe the reward. the current approach is to split this, but that's ugly. another option might be for LinearPolicy to provide something that returns a continuation? any other ideas?