quangr / DB-Football

A Simple, Distributed and Asynchronous Multi-Agent Reinforcement Learning Framework for Google Research Football AI.
Other
1 stars 0 forks source link

Code Note #1

Open quangr opened 1 year ago

quangr commented 1 year ago
  1. Why does Gpu not working and the meaning reward keep upgoing? Beacuse the rollout_metric_cfgs.reward.init_list is seted to low.
  2. How does two agents' data trained as a share network The collected data will be split inevidually and fallen when training https://github.com/quangr/DB-Football/blob/d5ae999fbb12aaa309e109e06f443adebc15d2bb/light_malib/training/data_generator.py#L111
  3. What does prefetcher do? They fetch data from rollout asynchronously
  4. Where dose rollout combination come from? They come from strategy planning, and the prso will calc the nash equabrillium
  5. Is the asynchronsous data on-policy? the psro_scheduler will generate training_desc which achieve nash equabrillium in former policy, if set share_policies to 1, will always set training agent to agent_0,and there is a random_permute to change agents poistion. So when things are unsymmtry it's not on-policy
  6. What dose update_func do? It collect data and calc payoff matrix
quangr commented 1 year ago

I'm done looking at code, Now trying two things:

  1. plug in tianshou policy framework, and using ppo to train a 1v1 agent ~First thing is to create a dummay agent to fill other agent~ ~It seems like in 1v1 scenario, should mask action~ it turns out can set player as uncontrollable... The debugging is so annoying, should have some mock function! The code is compeletly MESS, it seems like using tianshou framework need buffer, so should write a new collect function