seohongpark / HIQL

HIQL: Offline Goal-Conditioned RL with Latent States as Actions (NeurIPS 2023)
MIT License
71 stars 6 forks source link

A question about the bug in antmaze dataset. #2

Closed fuyw closed 11 months ago

fuyw commented 11 months ago

Hi,

I want to ask a question about the bug in the antmaze dataset, i.e., antmaze-large-diverse-v2.

In the goal-reaching setting, there should be at most one non-zero reward ending state in a trajectory.

However, the following strategy will lead to some trajectories with more than one non-zero reward transitions.

https://github.com/seohongpark/HIQL/blob/af153f5ab6eeed8d983dca9d3b4bbf7f55d1fe54/src/d4rl_utils.py#L47

An example is:

env = gym.make("antmaze-large-diverse-v2")
dataset = d4rl.qlearning_dataset(env)
print(dataset["rewards"][552000:553000])
print(dataset["terminals"][552000:553000])

Anyway, this is a small bug of the original dataset, where a trajectory should be terminated when it reaches the first non-zero reward.

One solution is to manually apply early stopping at the first non-zero transition in a trajectory and discard the remaining data. Another solution is to use the old terminal flag, which will lead to many single-step trajectories.

Plus, could you please send me the code of the visual ant-maze environment as well. My email address is yuwei.fu@mail.mcgill.ca.

Huge thanks.

seohongpark commented 11 months ago

Hi fuyw,

As you mentioned, the original D4RL antmaze dataset has some issues with rewards and terminals for GCRL, and that's why we completely relabeled all the terminal labels by the np.linalg.norm(dataset['observations'][i + 1] - dataset['next_observations'][i]) > 1e-6 criterion. Please note that, after relabeling, our dataset contains exactly 999 trajectories of length 1000. For example, dataset['dones_float'][552000:553000].sum() is 1 in our dataset (as opposed to >700 in the original dataset). Also, since we do not use any reward labels (because we replace them with a sparse goal-conditioned reward function of 1(s=g)) from the original dataset, we do not particularly correct them in get_dataset().

fuyw commented 11 months ago

Many thanks for the reply.