Closed MurrayMa0816 closed 8 months ago
Hi, glad that you found the work useful.
To answer your first question. Yes, in general, the 'unroll length' is much smaller than the average episode length. For example, maybe around 1/10 of the average episode length, but there's no good ratio and you'd have to try and find out depend on the task at hand.
Regarding the second question, unfortunately I don't have prior experience with the MiniGrid environment. My advice is that you may try using a much smaller 'unroll length', and better yet maybe try the NGU agent (or Agent57) instead R2D2. Because the NGU agent and Agent57 are much better at hard-to-explore problems, especially if the rewards are sparse. I guess your task might be one (but not so sure).
Last thing to point out, there is also the possible there are some unknown bugs in our work, so it may not work as intended. If you've found such one feel free to report it.
Hi, @michaelnny ,
Thanks for your repository, helped me a lot. I encountered an issue while using it and would like to seek your advice.
When using the R2D2 method, data generated by the interaction between the actor and the environment is first stored in an Unroll. Then, when the Unroll is full or when done=True, the data inside the Unroll is placed in a queue.
I think the parameter "unroll_length" should be set to less than the maximum episode length of the environment. Otherwise, "Unroll" may not be filled before reaching "done=True", resulting in missing content at "self._last_unroll=None". I'm unsure about my thoughts and would like to seek your advice, thank you.