Reinforcement Learning Capability

adding some brainstorming notes:

goal - find the optimal simulation matrix configuration that minimizes rescue time

State representation - output of a particular run - rescue time? sector coverage?
Action - modifying drone behaviors in simulation_matrix.xlsx
Reward - @Reddy2 can you review? One idea could be compare the actual rescue time against a baseline. Or base the reward on if the drone's sensor coverage overlapped a sector that was previously measured?

implementation brainstorm:

dronelab controller: create and run an instance of a dronelab

example: controller.go sim="C:\Users\P008702F\Desktop\Simulation_Matrix.xlsx" config="C:\Users\P008702F\Desktop\config.txt"

output: C:\Users\P008702F\dronelab_89df2b2e-fb31-47ba-8dd6-64a33a38e55c

rdarnold / dronelab