ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.19k stars 5.8k forks source link

[tune] Scheduler to skip a trial when model returns NaN predictions? #8671

Closed PoCk3T closed 4 years ago

PoCk3T commented 4 years ago

Environment:

Hello everyone, First of all: big thank you to the community and the developers, this is a master piece of a framework and I can't imagine how I would have learned about RL without projects like this or Stable Baselines

Question: for some trial, the hyperparameters randomly chosen are not a good fit for the model, which basically ends up giving NaN action for my environment. How to tell the scheduler, like the Population Based one, to not bother and move on to the next trial ?

Solutions attempted so far:

  1. The stupid solution: in my custom gym environment, choose to give back 0 as a reward for such NaN action, but Tune is wasting a lot of time going through all the steps only to conclude it was a deadend
  2. Raise an exception in my environment when facing a NaN action + Wrapper around the scheduler to catch it
    
    from ray.tune.schedulers import PopulationBasedTraining
    from ray.tune.schedulers import TrialScheduler

class FaultToleranceForPopulationBasedTraining(PopulationBasedTraining): def on_trial_error(self, trial_runner, trial): return TrialScheduler.STOP



The problem with solution attempted 2. is that the whole tune.run() crash on the following exception:

`ray.tune.error.TuneError: ('Trials did not complete', [PPO_MyCustomGymEnv-v1_00000])`

I couldn't find any similar issues on the Ray Github, but I'm sure I'm not the first one facing this kind of situation? How did you guys tackle it?!

Thanks in advance to anyone who wants to share some experience :)
Lucas
richardliaw commented 4 years ago

If you can perturb the returned result dict, you can send {"done": True} in the result dict and it should terminate your trial.

PoCk3T commented 4 years ago

Thanks a lot for the super prompt feedback Richard, I will try that! (I think a wrapper on a scheduler is not appropriate, I will try to intercept the result dict from a custom callback implementation)

PoCk3T commented 4 years ago

Thanks again Richard for the tips on {'done': True}, it worked for me Here's the solution I implemented for everyone:

sjiang95 commented 1 year ago

My solution following Stopping and Resuming a Tune Run:

import math

def stopnanloss(trial_id, result):
    return math.isnan(result["loss"])

Pass the custom function to air.RunConfig

...
tuner = tune.Tuner(
    my_trainable, 
    run_config=air.RunConfig(stop=stopper)
    )
results = tuner.fit()
...

Output example

Result for tune_with_parameters_32fe0_00006:
    date: 2023-03-16_11-34-03
    done: true
    loss: .nan
    experiment_id: 64de5b9cece94771906b59add38e4d11
    hostname:
    iterations_since_restore: 11
    node_ip:
    pid: 2090409
    time_since_restore: 776.1911494731903
    time_this_iter_s: 70.30889534950256
    time_total_s: 776.1911494731903
    timestamp: 1678934043
    timesteps_since_restore: 0
    training_iteration: 11
    trial_id: 32fe0_00006
    warmup_time: 0.003095865249633789

Trial tune_with_parameters_32fe0_00006 completed.