What does prefetcher do?
They fetch data from rollout asynchronously
Where dose rollout combination come from?
They come from strategy planning, and the prso will calc the nash equabrillium
Is the asynchronsous data on-policy?
the psro_scheduler will generate training_desc which achieve nash equabrillium in former policy, if set share_policies to 1, will always set training agent to agent_0,and there is a random_permute to change agents poistion. So when things are unsymmtry it's not on-policy
What dose update_func do?
It collect data and calc payoff matrix
plug in tianshou policy framework, and using ppo to train a 1v1 agent
~First thing is to create a dummay agent to fill other agent~
~It seems like in 1v1 scenario, should mask action~
it turns out can set player as uncontrollable...
The debugging is so annoying, should have some mock function!
The code is compeletly MESS, it seems like using tianshou framework need buffer, so should write a new collect function