Open wjbianjason opened 5 years ago
I find w_write(t) = w_read(t-1) + w_lt(t), why use t-1 timestep read weight rather than t timestep read weight, I think w_read(t) is more related to w_write(t), is there some consideration ? thanks
I find w_write(t) = w_read(t-1) + w_lt(t), why use t-1 timestep read weight rather than t timestep read weight, I think w_read(t) is more related to w_write(t), is there some consideration ? thanks