salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

factorize data loader and trainer for batches #63

Closed Emerald01 closed 1 year ago

Emerald01 commented 1 year ago
  1. factorize original data loader, so observation, rewards, actions have a cleaner loading processes. This effort includes (a) the loading of stepwise placeholders and batch data are separated (b) originally, action placeholders are defined outside of create_and_push_data_placeholders(), now this function combines all, so giving a better single API for all data creation (c) batch placeholders are always separated by policies. (d) each action placeholder for sampler is in the shape of [env, agent, 1], this is more consistent with the action placeholder in the shape of [env, agent, num_action_types]
  2. separate the model from the observation batch definition (old way, the observation batch is defined inside the model), so model could be more independent
  3. documentation for our batch definition
  4. lightning api update