Closed fuyw closed 11 months ago
Hi fuyw,
As you mentioned, the original D4RL antmaze dataset has some issues with rewards and terminals for GCRL, and that's why we completely relabeled all the terminal labels by the np.linalg.norm(dataset['observations'][i + 1] - dataset['next_observations'][i]) > 1e-6
criterion. Please note that, after relabeling, our dataset contains exactly 999 trajectories of length 1000. For example, dataset['dones_float'][552000:553000].sum()
is 1
in our dataset (as opposed to >700 in the original dataset). Also, since we do not use any reward labels (because we replace them with a sparse goal-conditioned reward function of 1(s=g)
) from the original dataset, we do not particularly correct them in get_dataset()
.
Many thanks for the reply.
Hi,
I want to ask a question about the bug in the antmaze dataset, i.e., antmaze-large-diverse-v2.
In the goal-reaching setting, there should be at most one non-zero reward ending state in a trajectory.
However, the following strategy will lead to some trajectories with more than one non-zero reward transitions.
https://github.com/seohongpark/HIQL/blob/af153f5ab6eeed8d983dca9d3b4bbf7f55d1fe54/src/d4rl_utils.py#L47
An example is:
Anyway, this is a small bug of the original dataset, where a trajectory should be terminated when it reaches the first non-zero reward.
One solution is to manually apply early stopping at the first non-zero transition in a trajectory and discard the remaining data. Another solution is to use the old
terminal
flag, which will lead to many single-step trajectories.Plus, could you please send me the code of the visual ant-maze environment as well. My email address is yuwei.fu@mail.mcgill.ca.
Huge thanks.