Closed wubmu closed 2 years ago
你好,这里的意思是多少轮train一次~总之就是采样步数和训练次数的比例很重要。
你好,我在你们的文章中看到S=E_P_I。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次
你好,我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次