ymzhang01 / focops

Pytorch Implementation for First Order Constrained Optimization in Policy Space (FOCOPS).
24 stars 5 forks source link

About the evaluation. #3

Open Zarzard opened 1 year ago

Zarzard commented 1 year ago

Hello! I have read your FOCOPS paper and would like to reproduce the results. I found that in MuJoCo environments the lengths of different episodes may differ largely, ranging from 20- to 2000+. Thus, the cost attributed to each step may be hard to adjust by the policy if you fix the total safety budget for episodes with different lengths, and I found that the naive sac-lag algorithm performs poorly under the same constraint with FOCOPS, in the sense that when the episode step becomes large, the total cost would also become large. How do you fix this problem? (BTW, I can't found the evaluation part in the focops code, did you fix the episode step during training or evaluating?)