Closed rschireman closed 1 year ago
Hi @rschireman,
Sure, with some small modification below on the source code, you can sweep hyperparameters.
In nequip/scripts/train.py
line 25, change the default run_name="NequIP",
into run_name="rand_"+str(uuid.uuid4()),
and insert import uuid
in the first line so that the sweep will generate random run_name
for the calculations.
You can either change the code in your environment directly or on the source code with pip install .
.
Then you can create two config file, one for the sweep hyperparameter and one for the basic config file.
For sweep_config.yaml
, here is an example:
project: {your project name}
program: nequip-train
method: random
name: sweep_1
description: sweep over loss coefficients and batch size
# metric:
# goal: minimize
# name: validation_f_rmse
command:
- ${env}
- ${program}
- ".../base.yaml" # Absolute path of your basic config yaml file
parameters:
batch_size:
distribution: int_uniform
min: 1
max: 5
loss_coeffs:
parameters:
stress:
distribution: uniform
min: 1.0
max: 10.0
total_energy:
value:
- 1.
- PerAtomMSELoss
forces:
distribution: uniform
min: 1
max: 10
Hope it helps!
Thanks very much for posting this info @Hongyu-yu !
An alternative, without modifying the source, would then also be to make your program
in for wandb a simple python script that gets the JSON config of the sweep hyperparams, reads the base config, and writes a new YAML file of the merge for the specific run. This script could be re-used for all sweeps.
If that is the only modification needed to make it work, alternatively, we could add a --uuid-run-name
flag to nequip-train
so this works out of the box. The only unpleasant thing about that is that restarting/finding training runs becomes a pain, since you have to go through each uuid named dir looking for the right values in config.yaml
.
Hi all,
I'm relatively new to the code here, is there a way to use Weights and Biases to sweep hyperparameters (like the batch size, etc.)? I've been using the following code:
and my
sweep_config
looks like this:The first iteration runs fine, but the subsequent runs fail