Open lucywang720 opened 1 year ago
we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would appreciate for your help!!
we are now reproducing this paper, but we are confused about r_A in this paper. May I ask how to calculate r_A with each-step reward produced by PRM? I would appreciate for your help!!