shamilmamedov / flexible_arm

3 stars 0 forks source link

Visualizations / Plots #24

Open Erfi opened 1 year ago

Erfi commented 1 year ago

Description: To help us decide how to run the evaluations and which data to save during training / evaluation, we should decide on what plots we would like to see on the paper.

shamilmamedov commented 1 year ago

It's a very good question. Here are my thoughts

What do you think? What else could we add?

Erfi commented 1 year ago

1) A table of (mean rewards, mean path length) over 10-20 evaluation rollouts for all IL, RL, IRL [without safety filter]

Select the winner (DAGGER, SAC, MPC, DAGGER+SF, SAC+SF, MPC_HOR1, MPC_HOR2)

2) scatter plot x-axis: Computation Time Per Action y-axis: Number of Constraint Violations (best is lower left)

3) Discussion

4) Video

Erfi commented 1 year ago

@shamilmamedov for point 1 (comparing all algorithms based on reward) I can plot a boxplot of the trajectory rewards (this one with only 5 rewards and later with 100 trajectories for each algorithm (currently waiting for the trajectory collection to finish)). kpi_rewards

right now the goals are sampled everywhere (100 trajectories for each algorithm, with 100 randomly sampled goals). You mentioned before that we might want to also have trajectories where the goals are sampled near the wall. In that case can you give me the start and end for qa_goal? We will need to change the qa_goal to qa_goal_start and qa_goal_end and sample in that range.

shamilmamedov commented 1 year ago

I thought that in the point one we compare only IL algorithms in the env without the goal and chose the best one for further experiments.

Since our reward is the - dt* |p_goal - p_ee|_2 indeed we don't need to separately plot the path length

For the point two, it would have been great if we could sample the goal position and the initial position from different sets. It is what you meant, right? I will provide the ranges tomorrow

Erfi commented 1 year ago

yes, exactly. For point one the goal can be anywhere. But for point two let's have specific ranges.

shamilmamedov commented 1 year ago

For the second point, the range of angles for the goal position are


qa_range_start = np.array([-np.pi / 12, 0.0, -np.pi + 0.2])
qa_range_end = np.array([-np.pi / 12, np.pi/2, 0.0])
Erfi commented 1 year ago

kpi_reward_constraint_scatter (with 10 trajectories per algo, goal sampled near the wall)

This shows that using the safety wrapper affects the performance (reward) quite a lot. We should definately do what you set and set the ref_u to something more meaningful than zeros.

shamilmamedov commented 1 year ago

It's to be expected that safety wrapper will affect the performance, it is gonna make the policy more conservative. However, we should indeed tune the safety filter as much as we can. I'll try to fix ref_u now

Erfi commented 1 year ago

another plot but (inference time) vs (violations) kpi_time_constraint_scatter

shamilmamedov commented 1 year ago

Surprisingly the constraint violation doesn't go down much when using the safety filter. It's not a good news(

Erfi commented 1 year ago

True, for SAC it seems to be helping more tho

shamilmamedov commented 1 year ago

We need to compute what is the gain in terms of constraint violation uusing SF, maybe because of the X-axis being big we don't see the improvement

Erfi commented 1 year ago

I am not clear what you mean by that can you explain a bit more, and how would you think we can implement it?

shamilmamedov commented 1 year ago

Sorry, I was talking about the figure.

Erfi commented 1 year ago

kpi_time_constraint_scatter

It seems like for DAGGER it actually increases the violations slightly

shamilmamedov commented 1 year ago

Too bad. Perhaps we should separate the constraints into critical like the wall and the ground, and non-critical ones like joint constraints

Erfi commented 1 year ago

I think that might be a good idea. let me replot only with wall and ground constraints and see what happens

Erfi commented 1 year ago

kpi_time_constraint_scatter Yes! (only wall and ground constraints)

shamilmamedov commented 1 year ago

Awesome! Theses constraints are really critical thus we can argue why we chose them

shamilmamedov commented 1 year ago

Ideally, we should also change the stiffness of the joints and see how robust policies are. For that some time ago I changed the SymbolicFlexibleArm3DOF to accept the flexibility params. If we could somehow pass those params through env, then we could test the policies and put the results to paper as well

Erfi commented 1 year ago

So for joint-flexibility experiment, we want to see the performance of the trained algorithms of a version of the environment that has a different flexibility parameters. What about the MPC or SafetyFilter? As I understand the MPC then will use the old flexibility parameters (There will be a deliberate missmatch). What about the SafetyFilter? Should the SF be based on the new flexibility parameters or the old ones?

shamilmamedov commented 1 year ago

SF will also use old flexibility parameters. We will introduce model-plant mismatch at all levels. I speculate, that SF might be a bottleneck

Erfi commented 1 year ago

@shamilmamedov Ok, so for the robustness plot (modifying flexibility parameters)

  1. what I see is that we can input a path to the file with flexibility parameters in it. Do we have a file with modified parameters (more flexible?) if so what's the path?
  2. How should we visualize it? e.g. a scatter plot withx-axis: constraint-violations and y-axis: rewards with purple dots for matching model (what we currently have) and orange for mis-matched (more flexible) env?
shamilmamedov commented 1 year ago

In the models folder, there are two .yml files: flexible_link_params.yml and flexible_link_params_test.yml. If not provided, SymbolicFlexibleArm3DOF defaults to flexible_link_params.yml. If we pass path to flexible_link_params_test.yml through fparams_path to SymbolicFlexibleArm3DOF then we will have a model with different flexibility parameters.

Right now the test flexibility parameters (only the stiffness) are 10% different from the training one.

shamilmamedov commented 1 year ago

Scatter plots sounds good to me. If we don't have space, then we can use a table.