The following articles were referred to complete this project.
The initialization of control actions at each time step was obtained by modeling the problem as an infinite horizon LQR. The reaching trajectories due to this control are unsatisfactory as shown in the initial animation and figure below (red trajectories).
The controller actions were optimized iteratively using Differential Dynamic Programming as described in (2). The final control solutions shown in the figure below (blue trajectories) and animated in final animation were reached under 10 iterations for all targets. Trajectories from selected inbetween iterations are shown in grey (dark to light in increasing iteration number).