jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
62 stars 15 forks source link

Bug: Trajectory Dataset contains pre-emptively truncated trajectories from where PPO get's cut off #12

Open jbloomAus opened 1 year ago

jbloomAus commented 1 year ago

This is a bug in the trajectorywriter/offline dataset where we end up truncating some trajectories when we finish online training and this leads to having “short” truncated trajectories, which are bad for our data. It would be good to remove them. They are visible in the visualization of the reward over traj-lengths as spots on the x-axis but not at max-length.

A link to the method I use to ensure that these get labelled as truncated to avoid bugs: https://github.com/jbloomAus/DecisionTransformerInterpretability/blob/c84edb381c53b3f9ef2fa9517e34914a52e15fbd/src/utils.py#L59

qb5x9yhvt3htnlwjc5zs