Is the expert data the real expert?

Ericonaldo commented 1 year ago

I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?

env = gym.make('asterix-expert-v0'.format(game), stack=True)\
dataset = env.get_dataset()

# Split trajectories
traj_ends = np.where(dataset['terminals'] == 1)[0]
traj_start_ends = [(0, traj_ends[0])]

for i in range(len(traj_ends) - 2):
    traj_start_ends.append((traj_ends[i], traj_ends[i + 1]))

rewards_list = list()
for traj_start, traj_end in traj_start_ends:
    rewards_list.append(np.array(dataset['rewards'][traj_start:traj_end][:,np.newaxis]))

print(np.mean([np.sum(_) for _ in rewards_list]), np.std([np.sum(_) for _ in rewards_list]))

Ericonaldo commented 1 year ago

Seems the rlunplugged dataset is using clipped reward

KeLiChloe commented 1 year ago

Hi I met the same question. Do you know how to scale the clip reward to real reward? thanks!

Ericonaldo commented 1 year ago

Not to my knowledge. No.

takuseno commented 1 year ago

Sorry for the super response. But, yes, the rewards are clipped. Also ,let me redirect you to this publication since this repository is simply relying on the dataset provided by Google. https://arxiv.org/abs/1907.04543

takuseno / d4rl-atari

Is the expert data the real expert? #13