Maybe don't use neuroevolution for now; it may be too complicated to incorporate neuroevolution for training an agent in a 2-player scenario (also I'll have to admit I am new to both reinforcement learning and neuroevolution)
Decide if I should use value-based, policy-based, or model-based RL methods
Choosing policy-based methods. They seem to be more suited to the problem due to the continuous nature of the action space as well as the need to perform simultaneous actions.