lucidrains / robotic-transformer-pytorch

Implementation of RT1 (Robotic Transformer) in Pytorch
MIT License
361 stars 31 forks source link

questions about the implementation #8

Open ssvision opened 1 month ago

ssvision commented 1 month ago

my current setup consists of a universal robot 6dof UR5e arm along with a onRobot gripper. There is a Intel Real Sense mounted on the head which is static. (assume it's a single arm humanoid robot with camera mounted on the head). Now when i run the model i.e pass an image and an instruction the model is supposed to output 7 values for the action space which are (x, y, z, roll, pitch, yaw, gripper state). My questions are as follows

1. The action space output from the transformer model i.e the end effector pose. Is it defined in camera frame?
2. what are the pose values ? are they absolute values or delta values as i am a bit confused with the terminology being used in the README section?
1343744768 commented 2 weeks ago

I am also confused about the model’s output. From the example, it can be seen that if the input is a sequence of 6 frames of images with the shape (2, 3, 6, 224, 224), the output is (2, 6, 11, 256). What I can imagine is that after applying argmax or some how, the output should be (2, 6, action). Does this mean that the output represents the actions for the next six time steps? or it has other methods to define one step action?