jurgisp / memory-maze

Evaluating long-term memory of reinforcement learning algorithms
MIT License
129 stars 13 forks source link

[Explanations on offline data] #6

Closed junmokane closed 1 year ago

junmokane commented 1 year ago

Hi, I have question on offline data specification.

When I loaded the one of npz file, I noticed that all the keys like 'action' or 'reward' or 'terminal' have size of 1001.

Did you just put dummy 'action', 'reward', 'terminal' for the first element?

I mean if the original sequence is O_0, a_0, r_0, t_0, O_1, a_1, r_1, t_1, ... (O: image, a: action, r: reward, t: terminal), is the offline data formed as O0, a-1, r-1, t-1, O_1, a_0, r_0, t0, ... (a-1, r-1, t-1 are some dummy values) ?

Thanks.

jurgisp commented 1 year ago

Yes, that is correct. The first entry of action, reward, and terminal are indeed dummy zero values.