michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
Apache License 2.0
104 stars 11 forks source link

the self.add() of "Unroll" in replay.py #19

Closed MurrayMa0816 closed 8 months ago

MurrayMa0816 commented 9 months ago

Hi, @michaelnny ,

Thanks for your repository, helped me a lot. I encountered an issue while using it and would like to seek your advice.

When using the R2D2 method, data generated by the interaction between the actor and the environment is first stored in an Unroll. Then, when the Unroll is full or when done=True, the data inside the Unroll is placed in a queue.

    def add(self, transition: Any, done: bool) -> Union[ReplayStructure, None]:
        """Add new transition into storage."""
        self._storage.append(transition)

        if self.full:
            return self._pack_unroll_into_single_transition()
        if done:
            return self._handle_episode_end()
        return None

    def _pack_unroll_into_single_transition(self) -> Union[ReplayStructure, None]:
        """Return a single transition object with transitions stacked with the unroll structure."""
        if not self.full:
            return None

        _sequence = list(self._storage)
        # Save for later use.
        self._last_unroll = copy.deepcopy(_sequence)
        self._storage.clear()

        # Handling adjacent unroll sequences overlapping
        if self._overlap > 0:
            for transition in _sequence[-self._overlap :]:  # noqa: E203
                self._storage.append(transition)
        return self._stack_unroll(_sequence)

    def _handle_episode_end(self) -> Union[ReplayStructure, None]:
        """Handle episode end, incase no cross episodes, try to build a full unroll if last unroll is available."""
        if self._cross_episode:
            return None
        if self.size > 0 and self._last_unroll is not None:
            # Incase episode ends without reaching a full 'unroll length'
            # Use whatever we got from current unroll, fill in the missing ones from previous sequence
            _suffix = list(self._storage)
            _prefix_indices = self._full_unroll_length - len(_suffix)
            _prefix = self._last_unroll[-_prefix_indices:]
            _sequence = list(itertools.chain(_prefix, _suffix))
            return self._stack_unroll(_sequence)
        else:
            return None
  1. The first question is, does the setting of "unroll length" have to be smaller than the maximum length of the task? And is it necessary to adjust this value based on the maximum length of different tasks?

I think the parameter "unroll_length" should be set to less than the maximum episode length of the environment. Otherwise, "Unroll" may not be filled before reaching "done=True", resulting in missing content at "self._last_unroll=None". I'm unsure about my thoughts and would like to seek your advice, thank you.

  1. I am currently using the MiniGrid environment. For instance, in MiniGrid-MultiroomS2N4, the maximum episode length is 40. I have set the 'unroll_length' to 30, and 'burn_in' to 10. Despite running the R2D2 algorithm for one million steps, it has not converged. In Rainbow DQN, convergence is achieved within two hundred thousand steps. I am uncertain whether you have tested the R2D2 algorithm in MiniGrid-related environments. Initially, I speculated that the performance of this environment would be better with R2D2. This issue has been hindering me for two weeks, so I'm seeking your advice. Thank you very much.
michaelnny commented 9 months ago

Hi, glad that you found the work useful.

To answer your first question. Yes, in general, the 'unroll length' is much smaller than the average episode length. For example, maybe around 1/10 of the average episode length, but there's no good ratio and you'd have to try and find out depend on the task at hand.

Regarding the second question, unfortunately I don't have prior experience with the MiniGrid environment. My advice is that you may try using a much smaller 'unroll length', and better yet maybe try the NGU agent (or Agent57) instead R2D2. Because the NGU agent and Agent57 are much better at hard-to-explore problems, especially if the rewards are sparse. I guess your task might be one (but not so sure).

Last thing to point out, there is also the possible there are some unknown bugs in our work, so it may not work as intended. If you've found such one feel free to report it.