xin-pu / DeepSharp

secondary development by torchsharp for Deep Learning and Reinforcement Learning
MIT License
13 stars 1 forks source link

强化学习 or Reinforcement Learning #8

Open GeorgeS2019 opened 1 year ago

GeorgeS2019 commented 1 year ago

WIP: English version using Mermaid

policy

mindmap
  root[Reinforcement<br/>Learning]
    (Definitions)
      Interactions
          Environment
          Agent
      ((Elements))
         State
         Action
         Strategy<br/>策略
             Deterministic Policy<br/>确定性策略
             Stochastic Policy<br/>随机性策略
         State transfer probability<br/>状态转移概率
         Rewards<br/>即时奖励
      Others
        Episodes
        Trial
        Continuing Tasks
    {{Policy}}
      Policy based learning
      Value based learning
        Monte Carlo learning
             Temporal Difference Learning
                 SARSA<br/>State Action Reward State Action
                 QLearning
        Dynamic programming learning
            Policy iteration algorithm
                Policy Evaluation
                Policy Improvement
            Value iteration algorithm
    Markov Decision Process
      Markov Decision Process<br/>马尔科夫决策过程
           Trajectory<br/>轨迹
      Markov Process<br/>马尔科夫过程
   Objective Functions
GeorgeS2019 commented 1 year ago

Question: Where MultiArm Bandit belongs on the mindmap?

xin-pu commented 1 year ago

In my structure. It's a specific environment which bleong to Enviroment.

GeorgeS2019 commented 1 year ago

I agree, however in other discussion, it belongs to a special RL learning involving exploration and exploitation

GeorgeS2019 commented 1 year ago

@xin-pu Try to have Bilingual in your mindmap as the Dotnet team will start looking at your work.

xin-pu commented 1 year ago

@GeorgeS2019 Alright.

GeorgeS2019 commented 1 year ago

If u have any question on TorchSharp https://github.com/NiklasGustafsson Ask @NiklasGustafsson. ..he is impressed with what u are doing

See discussion

xin-pu commented 1 year ago

@GeorgeS2019 Thanks.