google / brax

Massively parallel rigidbody physics simulation on accelerator hardware.
Apache License 2.0
2.33k stars 255 forks source link

Inconsistency in populating State.info #160

Open namheegordonkim opened 2 years ago

namheegordonkim commented 2 years ago

It seems that the class State defines info as a dictionary, and env wrappers are getting away treating it as such:

@struct.dataclass
class State:
  """Environment state for training and inference."""
  qp: brax.QP
  obs: jp.ndarray
  reward: jp.ndarray
  done: jp.ndarray
  metrics: Dict[str, jp.ndarray] = struct.field(default_factory=dict)
  info: Dict[str, Any] = struct.field(default_factory=dict)

e.g. https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/wrappers.py#L67

  def reset(self, rng: jp.ndarray) -> brax_env.State:
    state = self.env.reset(rng)
    state.info['steps'] = jp.zeros(())
    state.info['truncation'] = jp.zeros(())
    return state

However, default environments use self.sys.info to get an Info class object, but it's dumped in favour of creating a completely empty dictionary. e.g. https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/hopper.py#L81

  def reset(self, rng: jp.ndarray) -> brax_env.State:
    """Resets the environment to an initial state."""
    rng, rng1, rng2 = jp.random_split(rng, 3)
    qpos = self.sys.default_angle() + jp.random_uniform(
        rng1, (self.sys.num_joint_dof,), -.005, .005)
    qvel = jp.random_uniform(rng2, (self.sys.num_joint_dof,), -.005, .005)
    qp = self.sys.default_qp(joint_angle=qpos, joint_velocity=qvel)
    info = self.sys.info(qp)
    obs = self._get_obs(qp)
    reward, done, zero = jp.zeros(3)
    metrics = {
        'reward_forward': zero,
        'reward_ctrl': zero,
        'reward_healthy': zero,
    }
    return brax_env.State(qp, obs, reward, done, metrics)

So if I'm reading this right, the State.info is not supposed to be coming from self.sys.info? In which case, self.sys.info will never be returned as part of State. Is this intentional?

erikfrey commented 2 years ago

Yes, this is confusing. Info is a struct returned by the brax physics step containing extra metadata (mainly impulses). But it's ALSO extra metadata used by env wrappers for book-keeping. And on top of that, you noticed a useless line in hopper that is probably left over from a refactor. You're right, we're fetching info in hopper and then doing nothing with it. I'll make a note to delete that line.

This will probably become less confusing when we migrate environments out of Brax, then the term 'Info' will be less ambiguous because their two usages will be more separate.