Closed tkelestemur closed 3 years ago
Those nan
s suggest that either an observation or the network's parameters contain nan
. If the latter is the case, I would recommend checking loss values for each update to see how training gets unstable and lead to nan
.
It turns out that I had nan
s in my observations. Closing.
I have a custom environment implemented in Gym API. It has a 3-channel image observations and 4 actions. I'm training PPO with a CNN-based policy network. I get a CUDA error when sampling from the SoftmaxCategorialHead. The error happens at a different step even tough I'm using
pfrl.utils.set_random_seed(args.seed)
.The error is below with CUDA_LAUNCH_BLOCKING=1:
My network:
where IMPALACNN can be seen here.
As far as I understand the issues comes from sampling with infinite logits but I don't know why would the last linear layer would produce infinite values.
Edit: I printed out the
probs
and thelogits
of the Categorical dist and it seems like the policy network produces NaN values for a single batch of observation data. I double checked my environment and it doesn't return any NaN or inf values in the observations. The strange thing is that all the policy network returns NaN for each environment. I'm training with 16 environment with GPU. Here is the output of theprobs
andlogits
before the error: