Correct handling of `termination` vs `truncation`?

Hi, thank you so much for the CleanRL resource!

I have a question regarding the PPO implementation and how it handles the difference between episodes that ended because it was terminated (it completed the task) or truncated (it ran out of time).

A comment in the advantage calculation suggests that episodes that are not done are to be bootstrapped from the value function.

At the same time, both truncations and terminations are or'd together so both cases are counted as the same type of done:

https://github.com/vwxyzjn/cleanrl/blob/8cbca61360ef98660f149e3d76762350ce613323/cleanrl/ppo_continuous_action.py#L221

This seems to go against other findings/implementations: Time Limits in Reinforcement Learning, StableBaselines3.

Is the difference here that you assume that we're operating in environments with an actual episode timeout so that truncations mean failure? In other cases, there is no inherent sense of time-limit, only a designer desire for faster task solving, in which I think it makes sense to handle truncations separately.

Have I understood all of this correctly?

vwxyzjn / cleanrl

Correct handling of `termination` vs `truncation`? #457