xin-pu / DeepSharp

secondary development by torchsharp for Deep Learning and Reinforcement Learning
MIT License
14 stars 1 forks source link

Deep RL Algorithms #4

Open GeorgeS2019 opened 1 year ago

GeorgeS2019 commented 1 year ago

Current status:

RL Algorithms

Model Free (TorchSharp)

Also look into

BaseAlgorithm

Upon which the State of Art RL algorithms depend on.

RL Algorithms

image

These algorithms are classified into TWO groups

Policy or Non Policy, both inherited from the BaseAlgorithm

  • on_policy_algorithm.py
  • off_policy_algorithm.py

    """
    The base of RL algorithms
    
    :param policy: The policy model to use (MlpPolicy, CnnPolicy, ...)
    :param env: The environment to learn from
                (if registered in Gym, can be str. Can be None for loading trained models)
    :param learning_rate: learning rate for the optimizer,
        it can be a function of the current progress remaining (from 1 to 0)
    :param policy_kwargs: Additional arguments to be passed to the policy on creation
    :param stats_window_size: Window size for the rollout logging, specifying the number of episodes to average
        the reported success rate, mean episode length, and mean reward over
    :param tensorboard_log: the log location for tensorboard (if None, no logging)
    :param verbose: Verbosity level: 0 for no output, 1 for info messages (such as device or wrappers used), 2 for
        debug messages
    :param device: Device on which the code should run.
        By default, it will try to use a Cuda compatible device and fallback to cpu
        if it is not possible.
    :param support_multi_env: Whether the algorithm supports training
        with multiple environments (as in A2C)
    :param monitor_wrapper: When creating an environment, whether to wrap it
        or not in a Monitor wrapper.
    :param seed: Seed for the pseudo random generators
    :param use_sde: Whether to use generalized State Dependent Exploration (gSDE)
        instead of action noise exploration (default: False)
    :param sde_sample_freq: Sample a new noise matrix every n steps when using gSDE
        Default: -1 (only sample at the beginning of the rollout)
    :param supported_action_spaces: The action spaces supported by the algorithm.
    """