tianxiaoy9 / Data-driven-deep-reinforcement-learning-controller-for-DC-DC-buck-converter-feeding-CPLs

Source code for deep reinforcement learning in MATLAB
17 stars 4 forks source link

about the module #1

Open ydxuesheng opened 7 months ago

ydxuesheng commented 7 months ago

Hello, I have learned a lot from your materials. May I ask what the functions of du/dt and de/dt are in the model? How do they generate the observed value as state values? 您好,看了您的资料非常收益。请问模型中du/dt和de/dt的作用是什么,它们作为状态值是如何生成观测值的

tianxiaoy9 commented 6 months ago

Thank you for your question. From the perspective of physical quantities, the derivative represents the rate of change of this variable. We have two control objectives of this controller, one is to keep the tracking error as small as possible, and the other is to keep the settling time as small as possible. So these two states are used to ensure fast recovery speed.

ydxuesheng commented 6 months ago

Glad to receive your reply! I'll think about that a little bit later. At the same time, I also take the liberty to ask you another question, what is the role of the recombination module in the sim-to-real process? Secondly, how does the duty cycle in the simulation test affect the duty cycle in the real world? Are they simply connected through the relationship of mapping functions? 很高兴收到您的回复!我接下来再去思考一下这个问题。 同时,我还冒昧的请教您另一个问题,在sim-to-real的过程中,重组模块的作用是什么? 其次,仿真测试中的占空比是怎么影响现实情况的占空比的,他们是单纯的通过映射函数的关系联系在一起的吗?

------------------ 原始邮件 ------------------ 发件人: "tianxiaoy9/Data-driven-deep-reinforcement-learning-controller-for-DC-DC-buck-converter-feeding-CPLs" @.>; 发送时间: 2024年4月2日(星期二) 晚上8:34 @.>; @.**@.>; 主题: Re: [tianxiaoy9/Data-driven-deep-reinforcement-learning-controller-for-DC-DC-buck-converter-feeding-CPLs] about the module (Issue #1)

Thank you for your question. From the perspective of physical quantities, the derivative represents the rate of change of this variable. We have two control objectives of this controller, one is to keep the tracking error as small as possible, and the other is to keep the settling time as small as possible. So these two states are used to ensure fast recovery speed.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

tianxiaoy9 commented 6 months ago

Your questions are very interesting. The description in the article may not be particularly detailed. For the first question, as shown in Fig. 6 of the second paper, firstly, in step 1, the agent is trained offline and the detailed information about the DNN network is stored in the module of Matlab. At this time, the agent is able to control the system in Matlab/simulink platform. The control signal is given by Matlab module and it is a discrete constant. Then this digital signal will generate a PWM signal with a specific duty ratio through the signal generation module (in Simulink, i.e., DC-DC PWM generator). However, this module is unavailable for dSPACE or other sort of processor. Secondly, in step 2, the DNN with Matlab function is reorganized and it does not need the reward function block. In this step, the agent (the reorganized DNN) obtains the state of the system and gives the suitable action. For the second one, during the training procedure, only the nominal value of inductance and capacitance are considered (i.e., 1mH,1mF). However, if the agent is applied to the experimental setup directly, it may appear to system instability. Then, after utilizing the proposed DRM function, the steady-state error of the voltage is reduced significantly and the voltage fluctuation decreases compared to the situation without the DRM. You can understand that the controller in practice is implemented by the trained agent and DRM, while the DRM is considered as the feedforward loop to correct the deviation.