Closed eric000888 closed 1 year ago
Hi Eric, I think this may due to the misleading concept of "replay_buffer_size" in R2D2_GtrXL, which is actually the size of sequences rather than env steps. If use replay buffer size with 10^6, it contains 10^6 * sequence length env steps. For cartpole, the sequence length is 200. So it is a huge number. I test R2D2_GtrXL with your config on my PC. It seems that if use sequence length 25, and collect 10^6 sequences, the python process will take more than 100Gi memory for the data in the replaybuffer. You can use a small replaybuffer, I suggest.
Following is the config file i used.