Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
The above code is used for assembling the features from action and the features from state. Is this trick an normal operation in Actor-Critic framework? Would you like to introduce more about the motivation? Why not use the average of self.layer_2_s(s1) and self.layer_2_a(a)?
https://github.com/reiniscimurs/DRL-robot-navigation/blob/dea37acfc65f702f7fa792787e09602416cf85d4/TD3/velodyne_td3.py#L76
The above code is used for assembling the features from action and the features from state. Is this trick an normal operation in Actor-Critic framework? Would you like to introduce more about the motivation? Why not use the average of self.layer_2_s(s1) and self.layer_2_a(a)?