pagand / ORL_optimizer

offline RL optimizer
0 stars 0 forks source link

Sprint meeting 4 #10

Closed pagand closed 1 month ago

pagand commented 1 month ago

Jack: using PyQT for the simulator. Added pages and labels for the values.

Elena: Still working on the data preparation. Suggestion for trip countdown: find the closest leq full time to the episode clock time. Double later for consistency.

James: a Jax implementation of next state, reward and done. Suggested to separate training from forward path and put the checkpoint in same folder. Simpler approach suggested. We can assume the actions and state in test maintain the same distribution of offline data. Don’t have to predict reward as a seperated head, you can assume you have the formula, just plug in the estimated next state to get the reward. push to github branch, compare with the actual Gym data. Avoid add up error, as the usual run is around 1000 steps. Look at paper 2 to get some insight for adjusted

Pedram: Find the hyper parameters that increase the ReBRAC approach in half cheetah from around 45 to 65

Action items: Jack: add the labels to the main UI and push to the project github.

Elena: Finish All data prepration and feature selection/engineering by next meeting (June 7)

James: Finish first Seq model for a gym enviroenmnt (half cheetah) by next meeting (June 7)