Open itstyren opened 1 year ago
I get the same issue, except, I only get one NaN data point in total instead of the one per step. If I run the evaluation script, however, it reports positive rewards for the agents. So I am also wondering what is going on there
What are the number of workers you are using for this experiment? I have only seen this problem when there is a mismatch between train_batch_size, num_workers and sgd_minibatch_size. Check if you get the same error if you reduce sgd_minibatch size to 8 or 16.
Thanks for your reply.
What are the number of workers you are using for this experiment?
In this experiment, I have maintained the default number of workers, which is num_workers=2
.
Check if you get the same error if you reduce sgd_minibatch size to 8 or 16.
It seems that reducing the sgd_minibatch_size below 20 is not feasible due to the following error raised by RLlib:
ValueError: `sgd_minibatch_size` (16) cannot be smaller than`max_seq_len` (20).
To investigate whether there is a discrepancy between train_batch_size, num_workers, and sgd_minibatch_size, I increased the training sample as provided default setting:
"rollout_fragment_length": 10,
"train_batch_size": 400,
"sgd_minibatch_size": 32,
executed with the command: python baselines/train/run_ray_train.py --num_gpus 1 --wandb True
, the issue still persists, as shown in this wandb report.
ah, correct about the max_seq_len
. you can change that by adding it to your config too but it should always be greater than equal to sgd_minibatch_size
.
Could you change how long you train? So, change to 10 iterations or so and see if you get Nan for all iterations or only for first few?
Thanks, @rstrivedi, I can confirm that this problem persists even during long-time runs, as demonstrated in this report.
However, I just noticed a discrepancy between num_agent_steps_sampled and num_env_steps_sampled for the default settings, which are as follows: | Metric | Value |
---|---|---|
num_agent_steps_sampled | 3,200 | |
num_agent_steps_trained | 3,200 | |
num_env_steps_sampled | 400 | |
num_env_steps_trained | 400 |
When I modify train_batch_size to 3200, I am able to obtain accurate episode rewards. I'm uncertain if this is where the issue originates. Do you have any suggestions or insights regarding this matter? Thanks!
Problem: I am encountering an issue while running the MeltingPot baseline Ray training model. The episode rewards I am getting are consistently NaN (Not-a-Number).
Steps to Reproduce:
python baselines/train/run_ray_train.py --num_gpus 1 --wandb True
The training args are set as"Please let me know if there's any additional information or logs needed to diagnose this issue. Thank you for your assistance in resolving this problem.