[tune] Scheduler to skip a trial when model returns NaN predictions?

PoCk3T commented 4 years ago

Environment:

Docker with Ubuntu 16.04 & Python 3.6.10 & Ray 0.8.5
Population Based Scheduler alongside with PPO
Custom OpenAI gym of mine, registered within RLLib

Hello everyone, First of all: big thank you to the community and the developers, this is a master piece of a framework and I can't imagine how I would have learned about RL without projects like this or Stable Baselines

Question: for some trial, the hyperparameters randomly chosen are not a good fit for the model, which basically ends up giving NaN action for my environment. How to tell the scheduler, like the Population Based one, to not bother and move on to the next trial ?

Solutions attempted so far:

The stupid solution: in my custom gym environment, choose to give back 0 as a reward for such NaN action, but Tune is wasting a lot of time going through all the steps only to conclude it was a deadend

Raise an exception in my environment when facing a NaN action + Wrapper around the scheduler to catch it


from ray.tune.schedulers import PopulationBasedTraining
from ray.tune.schedulers import TrialScheduler

class FaultToleranceForPopulationBasedTraining(PopulationBasedTraining): def on_trial_error(self, trial_runner, trial): return TrialScheduler.STOP



The problem with solution attempted 2. is that the whole tune.run() crash on the following exception:

`ray.tune.error.TuneError: ('Trials did not complete', [PPO_MyCustomGymEnv-v1_00000])`

I couldn't find any similar issues on the Ray Github, but I'm sure I'm not the first one facing this kind of situation? How did you guys tackle it?!

Thanks in advance to anyone who wants to share some experience :)
Lucas

richardliaw commented 4 years ago

If you can perturb the returned result dict, you can send {"done": True} in the result dict and it should terminate your trial.

PoCk3T commented 4 years ago

Thanks a lot for the super prompt feedback Richard, I will try that! (I think a wrapper on a scheduler is not appropriate, I will try to intercept the result dict from a custom callback implementation)

PoCk3T commented 4 years ago

Thanks again Richard for the tips on {'done': True}, it worked for me Here's the solution I implemented for everyone:

In my custom Gym environment, I catch the NaN action in my "step" function and immediately return a predefined negative reward (REWARD_TO_STOP_LEARNING) along with done=True, an observation full of zeros and an empty info dictionary

Developed my own custom callback class (as per this example), with the following method override:

def on_train_result(self, trainer, result: dict, **kwargs):
  if 'hist_stats' in result:
    if REWARD_TO_STOP_LEARNING in result['hist_stats']['episode_reward']:
      result['done'] = True
      return

As a result, the scheduler will mark the trial as "TERMINATED"

sjiang95 commented 1 year ago

My solution following Stopping and Resuming a Tune Run:

import math

def stopnanloss(trial_id, result):
    return math.isnan(result["loss"])

Pass the custom function to air.RunConfig

...
tuner = tune.Tuner(
    my_trainable, 
    run_config=air.RunConfig(stop=stopper)
    )
results = tuner.fit()
...

Output example

Result for tune_with_parameters_32fe0_00006:
    date: 2023-03-16_11-34-03
    done: true
    loss: .nan
    experiment_id: 64de5b9cece94771906b59add38e4d11
    hostname:
    iterations_since_restore: 11
    node_ip:
    pid: 2090409
    time_since_restore: 776.1911494731903
    time_this_iter_s: 70.30889534950256
    time_total_s: 776.1911494731903
    timestamp: 1678934043
    timesteps_since_restore: 0
    training_iteration: 11
    trial_id: 32fe0_00006
    warmup_time: 0.003095865249633789

Trial tune_with_parameters_32fe0_00006 completed.

ray-project / ray

[tune] Scheduler to skip a trial when model returns NaN predictions? #8671