ray-project / ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Apache License 2.0
211 stars 34 forks source link

Early stopping #22

Closed athenawisdoms closed 3 years ago

athenawisdoms commented 3 years ago

What is the proper way to enable early stopping?

Do we use Lightnng's EarlyStopping callback?

def train_mnist(config):

    model = MNISTClassifier(config)

    metrics = {"loss": "ptl/val_loss"}
    callbacks = [
        TuneReportCallback(metrics, on="validation_end"), 
        EarlyStopping(monitor='val_loss', patience=10),           # Add Lighning's EarlyStopping callback here
    ]

    trainer = pl.Trainer(
        max_epochs=4,
        callbacks=callbacks,
        plugins=[RayPlugin(num_workers=4, use_gpu=False)])
    trainer.fit(model)    

Or should we use one of tune.run()'s parameters?

async_hb_scheduler = AsyncHyperBandScheduler(
    time_attr='training_iteration',
    metric='episode_reward_mean',
    mode='max',
    max_t=100,
    grace_period=10,
    reduction_factor=3,
    brackets=3)

analysis = tune.run(
      train_mnist,
      metric="loss",
      mode="min",
      config=config,
      num_samples=num_samples,
      resources_per_trial={
          "cpu": 1,
          "extra_cpu": 4
      },
      scheduler=async_hb_scheduler                 # Add Tune's trial scheduler here
      name="tune_mnist")

or

analysis = tune.run(
      train_mnist,
      metric="loss",
      mode="min",
      config=config,
      num_samples=num_samples,
      resources_per_trial={
          "cpu": 1,
          "extra_cpu": 4
      },
      stop={'epoch': 10},                 # stop after N epochs
      name="tune_mnist")
amogkam commented 3 years ago

It should work with either, though Tune provides more early stopping algorithm options and probably more flexibility as well. You just want to make sure you are not using both since that could lead to some weird problems.