DistributedRL training - Loss value is so high and not coming down

Problem description

The loss values are so high and not coming down over time.

Problem details

We are trying to create a racing environment and use reinforcement learning to train a model to do racing. So we started from this example. We wanted to test how much time it needs to train a model and how fat it can reach. I used the same parameters in the example. Except following one

   max_epoch_runtime_sec = 30

Also didn't change the code. I attached the output file from one agent. Please help me to troubleshoot what the issue is.

Experiment/Environment details

Used existing weights to start with. Started training on Azure with 6 NV6 machines. 5 agents and the trainer. While running the job I restarted the agents after some time. (After 12h) Then run the training for another 20h agent1.txt

microsoft / AutonomousDrivingCookbook