Code Note - Githubissues

Why does Gpu not working and the meaning reward keep upgoing? Beacuse the rollout_metric_cfgs.reward.init_list is seted to low.
How does two agents' data trained as a share network The collected data will be split inevidually and fallen when training https://github.com/quangr/DB-Football/blob/d5ae999fbb12aaa309e109e06f443adebc15d2bb/light_malib/training/data_generator.py#L111
What does prefetcher do? They fetch data from rollout asynchronously
Where dose rollout combination come from? They come from strategy planning, and the prso will calc the nash equabrillium
Is the asynchronsous data on-policy? the psro_scheduler will generate training_desc which achieve nash equabrillium in former policy, if set share_policies to 1, will always set training agent to agent_0，and there is a random_permute to change agents poistion. So when things are unsymmtry it's not on-policy
What dose update_func do? It collect data and calc payoff matrix

quangr / DB-Football