Closed ZhaoRunyi closed 1 month ago
Hi @ZhaoRunyi , unfortunately, we did not have the number of raw trajectories because we do the filtering during the data collection process. otherwise, it would be a huge storage and memory consumption when storing the entire dataset after one RL training is done. The ratios of different methods are not quite important, but rather, the cost thresholds would be more important. This is because the goal is to get diverse trajectories that could cover the reward-cost plot as large as possible, so the algorithm itself that collects these trajectories is not quite important.
Hi! I' ve read your paper and it's great work. But it seems that though in the paper you records the number of trajectories under different setting in a task (for instance in the
SafetyPointGoal
task, hard constarint settingSafetyPointGoal1-v0
has 2022 trajecories while soft constraint settingSafetyPointGoal2-v0
has 3442) and claims that this is the number of trajectories after applying the density filter, it's still unclear that:Thank you for answering my questions in advance and I'm looking forward to your reply.