Sprint #15 (Aug 24) - Githubissues

Test the gym env: Use the original data to test the env auto regressively Choose 5 trips, transform the data, feed to the model, transform the outputs back to the original space and use as input for the next prediction Compare by looking at figures and MSE for 5-10 different routes Test Policy optimizer: Use normal trips only Input: X(t-torque, t) + disturbance Rewards: R(t) vs. culmulative vs R(t-torque, t) use R(t) should be enough for now done reward: give a large reward (3 * normal) at done termination: terminate at some time (e.g 125) Epoch size: contain at least 50 trips per epoch, can consider mini batch to speed up the training Compute the average overall rewards (with gamma) Train at least five times then supervised learning Look at overall reward to see how well the training works Curriculum learning Stage 1: imitation learning mimic about 7 epochs Stage 2: compare with best 1% trip for reward about 7 epochs
Stage 3: conceptual rewards only (fc reward and time reward) Evaluate model: Choose one of the top 10 percent trips and compare 1 by 1 to see if the optimizer work Compute the average reward for all trips in the dataset for different epochs and compare them Schedule: Get the results of env testing by the end of today Get the optimization results ready on Monday, and meet to discuss the results

pagand / model_optimze_vessel

Sprint #15 (Aug 24) #35