Closed chance20210722 closed 7 months ago
It's a bug. The '6' is a hardcoded length that worked for the Get Up example because each get up behavior has 6 time slots, but it makes no sense to hardcode that information. For context, the observations follow a one-hot encoding scheme. Here are the expected observations for each time step, considering a behavior with 6 time slots:
First, the Reset function is called:
Then, the Step function is called successively:
For the terminal step (t=6), the returned observation is not used by the learning algorithm. In the current implementation, I returned [0,0,0,0,0,0] or obs=np.zeros(6)
as a dummy value. However, to allow behaviors with an arbitrary number of slots, it could be replaced by obs = self.obs[0]
. I will fix this soon.
In Get_Up.py, there is the following code In line 198 "obs = np.zeros(6)", why should the parameter here be set to 6?