Based on the available expert controller design examples (PI-based inner current/voltage control + droop control for power sharing) it will be very interesting to highlight the shortcomings and adavantages of applying state-of-the-art RL algorithms as a replacement for the expert-based controllers.
The task is therefore to implement a contemporary model-free RL, completely data-driven control approach for continuous states and actions learning an optimal control policy from scratch (e.g. DDPG).
Similar to the procedure for the implementation of the expert controllers, the RL implementation should first be approached for the simplest problems (single inverter current control) and from there be extended for further, more complex tasks.
It can be assumed that safety critical states (e.g. overcurrent) will occur during the training of the RL controller and perhaps even after its convergence to a local optimum. An essential object of investigation should therefore also be monitoring how often or in which particular situations the RL controller causes unsafe system states.
And finally the converged/learned RL controller should be compared to the previous examples using standard control approaches using the usual performance metrics (integrated mean-squared control error, mean absolute control error,...)
Based on the available expert controller design examples (PI-based inner current/voltage control + droop control for power sharing) it will be very interesting to highlight the shortcomings and adavantages of applying state-of-the-art RL algorithms as a replacement for the expert-based controllers.