Closed naijekux closed 3 months ago
I also print some variables inside of step(),
def step(self, state: State, action: jax.Array) -> State:
"""Runs one timestep of the environment's dynamics."""
print('step begins')
assert state.pipeline_state is not None
x_i = state.pipeline_state.x.vmap().do(
base.Transform.create(pos=self.sys.link.inertia.transform.pos)
)
vec_1 = x_i.pos[self._object_idx] - x_i.pos[self._hand_idx]
vec_2 = x_i.pos[self._object_idx] - x_i.pos[self._goal_idx]
reward_near = -math.safe_norm(vec_1)
reward_dist = -math.safe_norm(vec_2)
reward_ctrl = -jp.square(action).sum()
reward = reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near
pipeline_state = self.pipeline_step(state.pipeline_state, action)
obs = self._get_obs(pipeline_state)
state.metrics.update(
reward_dist = reward_dist,
reward_ctrl = reward_ctrl,
reward_near = reward_near,
)
print('step ends')
debug.print("vec_1:{x}",x=vec_1)
debug.print("reward_near:{x}",x=reward_near)
debug.print("q:{x}",x=pipeline_state.q[:7])
debug.print("goal_pos:{x}",x=x_i.pos[self._goal_idx])
return state.replace(pipeline_state=pipeline_state, obs=obs, reward=reward)
The outputs are as below,
reset starts
reset ends
reset starts
reset ends
step begins
step ends
goal_pos:[ 0.65 -0.15 -0.324]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
vec_1:[-0.351 0.354 -1.046]
reward_near:-1.1593074798583984
reward_near:-1.1593074798583984
reward_near:-1.1593074798583984
reward_near:-1.1593074798583984
...
...
...
goal_pos:[ 0.65 -0.15 -0.324]
goal_pos:[ 0.65 -0.15 -0.324]
goal_pos:[ 0.65 -0.15 -0.324]
goal_pos:[ 0.65 -0.15 -0.324]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
q:[nan nan nan nan nan nan nan]
the only variable outputing NaN values at the beginning is pipeline_state.q[:7]
, which is returned from pipeline_state = self.pipeline_step(state.pipeline_state, action)
, while the other variables were assigned before calling pipeline_step()
. That means that the NaNs error happens in pipeline_step()
.
Then test if the NaN values are raised because of the input parameter 'action'. At this way an array of ones is implemented as action,
# initialize the state
state = jit_reset(jax.random.PRNGKey(0))
rollout = [state.pipeline_state]
# grab a trajectory
for i in range(10):
ctrl = -0.1 * jp.ones(env.sys.nu)
state = jit_step(state, ctrl)
rollout.append(state.pipeline_state)
media.show_video(env.render(rollout), fps=1.0 / env.dt)
NaNs values are still there in the printed output,
reset
step begins
step ends
vec_1:[-0.362 0.35 -1.001]
reward_near:-1.1205109357833862
q:[nan nan nan nan nan nan nan]
goal_pos:[ 0.65 -0.15 -0.324]
vec_1:[nan nan nan]
reward_near:nan
q:[nan nan nan nan nan nan nan]
goal_pos:[nan nan nan]
vec_1:[nan nan nan]
reward_near:nan
q:[nan nan nan nan nan nan nan]
goal_pos:[nan nan nan]
vec_1:[nan nan nan]
reward_near:nan
q:[nan nan nan nan nan nan nan]
goal_pos:[nan nan nan]
vec_1:[nan nan nan]
reward_near:nan
q:[nan nan nan nan nan nan nan]
goal_pos:[nan nan nan]
vec_1:[nan nan nan]
reward_near:nan
...
vec_1:[nan nan nan]
reward_near:nan
q:[nan nan nan nan nan nan nan]
goal_pos:[nan nan nan]
So it excludes that the input parameter action causes the issue inside of pipeline_step()
Closing this issue since a duplicate one was opened https://github.com/google-deepmind/mujoco/issues/1484
Hi,
I used the
pusher
example env with a little modification to import MJCF of no-hand panda robot arm. The code for training remained also unchanged, the same as the example in colab forpusher
task.But the episode reward is NaN value when ploting the training result, shown in the screenshot below. After it's plotted the first time without any curve on it, the second plot won't be given anymore. (The screenshot presented the outputs at 9min). But GPU keeps running during the whole time, meaning that the code runing is not in idle state.![Screenshot from 2024-03-06 12-36-40](https://github.com/google/brax/assets/107070290/4fac69c7-d272-4195-94cd-5658a9556fd4)
According to the print for the start and end of functions reset() and step(), it begins to be frozen after the 1st step(). Thus this problem would be caused during training process. I'm sure that there're no operations in my own code such as being divided by 0, which generally cause the NaN error.
The XML file for the panda robot used is as below
Could someone please help me with this issue? Thanks