Goal position communication between MPC and FlexibleArmEnv (set-to-set)

Erfi commented 1 year ago

Description: The goal position can be selected randomly withing the FlexibleArmEnv class via a call to reset() if the qa_range_end is set to values other than zeros inside the FlexibleArmEnvOptions. MPC controller's goal is set using controller.set_reference_point(...) prior to use, hence it will be unaware of change to the goal after each call to the env.reset(). This will mean that the controller will drive the robot to its reference point even if the goal has changed. How should we implement the connection between the two?

Possible Implementation:

Extend/modify the imitation library's rollout.rollout function to allow passing of information between env and policy.
Call rollout.rollout in a loop and collect only one trajectory. Before each loop reset the goal and pass it to MPC (Hack)
Extend Observation Space of the environment to include the goal. This is highly related to #9

Acceptance Criteria: One should be able to run the data_collection script (test_mpc_data_collection.py) using a non-zero range to the qa_range_end and see the MPC drive the robot arm to the correct goal position.

RudolfReiter commented 1 year ago

I think the problem can be fixed by using controller.set_reference_point() inside the reset function

Erfi commented 1 year ago

This is also a comment for #9 The fix above implements the addition of goal to the observation space. The observation space had to be expanded to include goal state + goal coordinated (nx + 3). This is due to MPC's controller.set_reference_point(x_ref, p_ee_ref, ...) requiring both.

Erfi commented 1 year ago

multitask.webm

Erfi commented 1 year ago

Closing this issue for now.

shamilmamedov commented 1 year ago

I am wondering if the goal state x_final should be an observation. In realistic setting the goal state is something we don't know, thus making it an observations doesn't seem right to me. I would the final end-effector position xee_final as an observation or the difference between the current end-effector position and the goal (error-to-goal). What do you think?

Erfi commented 1 year ago

I completely agree with what you said. The reason for adding x_final was because MPC needed it for its calculation at the moment. @RudolfReiter Do you think it is possible to eliminate the goal state from the MPC inputs? We had a quick discussion about this and we made a comment that the inverse kinematic from the goal state might be ambiguous. But if it is possible (perhaps with some heuristics for help?) we should probably remove the goal state from the observations and only use the goal coordinates (absolute or relative)

RudolfReiter commented 1 year ago

Hi! I agree, it would be cleaner to have the x_ee position in the observation and not the whole state. The MPC would then need to compute the goal state out of x_ee internally.

Erfi commented 1 year ago

@RudolfReiter So is it okay if I remove the goal_state from the observation space? Can the MPC recover the goal_state from x_ee? if so I will wait for you to change the MPC and then I will remove the goal_state from the observations.

shamilmamedov commented 1 year ago

@RudolfReiter does Acados require the goal state? Can't we formulate the NMPC without goal state or set the goal state weighting matrices to zero?

RudolfReiter commented 1 year ago

It's better if we have the goal state in the MPC, since everything is very nonlinear. @Erfi , I will change the MPC for that and let you know.

shamilmamedov commented 1 year ago

The realistic way of getting the goal state is:

solve inverse kinematics for the rigid-body approximation to get qa_ik
find passive joint angles calling get_rest_configuration
Optional: solving inverse kinematics for estimation model with an initial guess found in (2) and big penalty on passive joint angles

After all, the goal state is gonna be an approximation of actual goal state. I am not sure if it is gonna help the solver or do the opposite.

shamilmamedov commented 1 year ago

BTW in Mpc3Dof class has a method _get_x_ref for roughly computing the goal state from the goal position

shamilmamedov / flexible_arm

Goal position communication between MPC and FlexibleArmEnv (set-to-set) #13