leggedrobotics / rsl_rl

Fast and simple implementation of RL algorithms, designed to run fully on GPU.
Other
505 stars 156 forks source link

time out bootstrapping possible bug. #20

Open HyunyoungJung opened 8 months ago

HyunyoungJung commented 8 months ago

Hi, thank you for sharing this amazing code.

Recently, I've been looking into the detailed implementation of the code in relation to the paper "Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning."

From my understanding, based on the paper, the reward function is bootstrapped in the event of a time-out. I believe this bootstrapping should apply to the subsequent state, following the formula: $r{new} = r + v(s')$, $s'$ represents the state resulting from the current step. However, in the current implementation of the code, it appears to be executed as: $r{new} = r + v(s)$, where $s$ is the state used for the current step.

Could you please clarify if my understanding aligns with the intended design? I am curious to know whether this implementation choice was deliberate for specific reasons or if it might be an oversight.

Thank you for your time and assistance.