jannerm / trajectory-transformer

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"
https://trajectory-transformer.github.io
MIT License
455 stars 63 forks source link

Calculation of value expectation #16

Closed SeanNobel closed 1 year ago

SeanNobel commented 1 year ago

Hi,

I'm having trouble understanding how you calculate expectation from probabilities and thresholds here. https://github.com/jannerm/trajectory-transformer/blob/c77076d1c39e8c8edc3d1e5032b55499de556d73/trajectory/utils/discretization.py#L108-L123

I understand that thresholds are quantiles calculated from the empirical distribution, but it's hard for me to grasp why you can get the expectation from the average of those two matrix multiplications.

Could you give me the explanation or a page or something to look at?

Thanks.