mir-group / nequip

NequIP is a code for building E(3)-equivariant interatomic potentials
https://www.nature.com/articles/s41467-022-29939-5
MIT License
565 stars 124 forks source link

❓ [QUESTION] Sweeping hyperparemeters with Weights and Biases #296

Closed rschireman closed 1 year ago

rschireman commented 1 year ago

Hi all,

I'm relatively new to the code here, is there a way to use Weights and Biases to sweep hyperparameters (like the batch size, etc.)? I've been using the following code:

sweep_id = wandb.sweep(sweep_config, project="sweep")
trainer = TrainerWandB(model=model,**dict(minimal_config))
trainer.save()
trainer.set_dataset(dataset)
wandb.agent(sweep_id, trainer.train(), count=5)

and my sweep_config looks like this:

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'validation_e'},
 'parameters': {'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 256,
                               'min': 32,
                               'q': 8},
                'learning_rate': {'distribution': 'uniform',
                                  'max': 0.1,
                                  'min': 0}}}

The first iteration runs fine, but the subsequent runs fail

Hongyu-yu commented 1 year ago

Hi @rschireman, Sure, with some small modification below on the source code, you can sweep hyperparameters. In nequip/scripts/train.py line 25, change the default run_name="NequIP", into run_name="rand_"+str(uuid.uuid4()), and insert import uuid in the first line so that the sweep will generate random run_name for the calculations. You can either change the code in your environment directly or on the source code with pip install ..

Then you can create two config file, one for the sweep hyperparameter and one for the basic config file.

For sweep_config.yaml, here is an example:

project: {your project name}
program: nequip-train
method: random
name: sweep_1
description: sweep over loss coefficients and batch size

# metric:
#   goal: minimize
#   name: validation_f_rmse

command:
  - ${env}
  - ${program}
  - ".../base.yaml" # Absolute path of your basic config yaml file

parameters:
  batch_size:
    distribution: int_uniform
    min: 1
    max: 5
  loss_coeffs:
      parameters:
          stress:
              distribution: uniform
              min: 1.0
              max: 10.0
          total_energy:
              value: 
              - 1.
              - PerAtomMSELoss
          forces:
              distribution: uniform
              min: 1
              max: 10

Hope it helps!

Linux-cpp-lisp commented 1 year ago

Thanks very much for posting this info @Hongyu-yu !

An alternative, without modifying the source, would then also be to make your program in for wandb a simple python script that gets the JSON config of the sweep hyperparams, reads the base config, and writes a new YAML file of the merge for the specific run. This script could be re-used for all sweeps.

If that is the only modification needed to make it work, alternatively, we could add a --uuid-run-name flag to nequip-train so this works out of the box. The only unpleasant thing about that is that restarting/finding training runs becomes a pain, since you have to go through each uuid named dir looking for the right values in config.yaml.