Unphysical structures when using TorchForce as NNP

eva-not commented 8 months ago

Hi, following up from the previous issue in https://github.com/openmm/openmm-torch/issues/130.

Increasing the number of neighbours worked and I can now run simulations with my trained models. However the bonds always look very stretched, for example:

I was wondering if there is something wrong with my code, or with the parameters used during training - or if simply the trained models are not good enough and better hyperparameters/more training data would fix the problem. I have experimented over the past week with different numbers of neighbours, neighbour embedding, embedding dimension, numbers of layers, RBF number/type, activation function but I haven't yet managed to get a reasonable-looking trajectory.

Script for running simulations:

from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout
import numpy as np
import torch
from openmmtorch import TorchForce

prmtop = AmberPrmtopFile('chignolin_dry.prmtop')
inpcrd = AmberInpcrdFile('chignolin_dry.inpcrd')

system = prmtop.createSystem(nonbondedCutoff=NoCutoff)

# remove MM forces
while system.getNumForces() > 0:
    system.removeForce(0)

# remove constraints
while system.getNumConstraints() > 0:
    system.removeConstraint(0)

from torchmdnet.models.model import load_model
model = load_model("train_layers_4_aligned_expnorm_rbf_18_neighbs_512/epoch=97-val_loss=609655.4375-test_loss=544.4850.ckpt")
torch.jit.script(model).save('test.pt')

class ForceModule(torch.nn.Module):
    def __init__(self, z):
        super(ForceModule, self).__init__()
        self.model = torch.jit.load('test.pt').cuda()
        self.z = z.cuda()

    def forward(self, positions):
        positions = positions.cuda()
        y, neg_dy = self.model(self.z, positions)
        return y, neg_dy

# get atomic numbers
z = []
for atom in prmtop.topology.atoms():
    z.append(atom.element.atomic_number)
z = torch.tensor(z, dtype=torch.long)

module = torch.jit.script(ForceModule(z))
torch_force = TorchForce(module)
torch_force.setOutputsForces(True)
system.addForce(torch_force)

integrator = LangevinMiddleIntegrator(298.15*kelvin, 1/picosecond, 0.001*picoseconds)
platform = Platform.getPlatformByName('CUDA')

simulation = Simulation(prmtop.topology, system, integrator, platform)
simulation.context.setPositions(inpcrd.positions)
simulation.minimizeEnergy()
simulation.reporters.append(DCDReporter('output_NNP.dcd', 100))
simulation.reporters.append(StateDataReporter(stdout, 100, step=True,
        potentialEnergy=True, temperature=True))
simulation.step(100000)

The explicit solvent simulation used for generating the training data (coordinates in nm, forces in kJ/mol nm) was run with constraints=HBonds and a timestep of 2 fs. I have also tried aligning the coordinates to the first frame (protein_coordinates_aligned.npy), but that shouldn't affect things.

Training input npy files, chignolin topology/coordinate files for implicit solvent and some trained models are here: https://www.dropbox.com/scl/fo/niupqtkll5f6gn0u8tszc/h?rlkey=wzpbar2lt8rottjk26ms335k6&dl=0

RaulPPelaez commented 8 months ago

Hi, I am sorry but I do not understand the image. The code you provide looks correct provided that the dataset you used has the correct units. You should not have to cast the positions to cuda, but its not incorrect. I am inclined to believe there is no error here, my guess is the model is just not trained in a way that results in a stable MD simulation. I see you are using this yaml to train the model:

activation: silu
aggr: add
atom_filter: -1
attn_activation: silu
batch_size: 8
charge: false
coord_files: protein_coordinates.npy
cutoff_lower: 0.0
cutoff_upper: 9.0
dataset: Custom
dataset_arg: null
dataset_root: .
derivative: true
distance_influence: both
early_stopping_patience: 30
ema_alpha_neg_dy: 1.0
ema_alpha_y: 1.0
embed_files: charge_embeddings.npy
embedding_dimension: 128
energy_files: null
equivariance_invariance_group: O(3)
force_files: protein_forces.npy
gradient_clipping: 0.0
inference_batch_size: 8
load_model: null
log_dir: train_layers_4_expnorm_rbf_18_neighbs_512/
lr: 0.0001
lr_factor: 0.8
lr_metric: val_total_mse_loss
lr_min: 1.0e-06
lr_patience: 4
lr_warmup_steps: 0
max_num_neighbors: 512
max_z: 100
model: graph-network
neg_dy_weight: 1.0
neighbor_embedding: true
ngpus: -1
num_epochs: 100
num_heads: 8
num_layers: 4
num_nodes: 1
num_rbf: 18
num_workers: 8
output_model: Scalar
precision: 32
prior_model: null
rbf_type: expnorm
redirect: false
reduce_op: add
reset_trainer: false
save_interval: 1
seed: 1
spin: false
splits: null
standardize: false
tensorboard_use: false
test_interval: 1
test_size: 0.1
train_size: null
trainable_rbf: false
val_size: 0.05
wandb_name: training
wandb_project: training_
wandb_resume_from_id: null
wandb_use: false
weight_decay: 0.0
y_weight: 1.0

I do not have the intuition to discuss hparams, perhaps @antoniomirarchi and @gillemsimeon can help with that. You are using the graph-network, which is not equivariant. Perhaps you would find it easier to get stable simulations with tensornet. But again, not that much intuition in this particular case. @antoniomirarchi, do the hparams in some of the TorchMD-Net examples result in stable simulations for chignoling?

I see you are using a file called "charge_embeddings.npy". If these embeddings are not the atomic numbers your model will not know what to do with the atomic numbers you are providing it with.

OTOH you are not providing energy_files, but you are setting y_weight to 1.0. I am not sure what your intention was by doing this, IMO this should be an error. Currently the code lets you do this, essentially ignoring the value of y_weight and training only with forces. Was this your goal?

eva-not commented 8 months ago

Thanks for the reply. A stable protein should look something like this instead:

but when a torchmd-net model is used instead of a classical force field, the bonds are overstretched resulting in unphysical conformations.

I have also trained models with the atomic numbers as embeddings, but still get similar results. The idea is to train a GNN with coordinates and forces, similar to this paper https://www.nature.com/articles/s41467-023-41343-1 where GNNs for coarse-grained simulations were trained with torchmd-net with coordinates and forces as input, and one of the proteins studied was chignolin. However, in my case I am trying to train on explicit solvent and use the model to run atomistic simulations in implicit solvent.

What should the value of y_weight be? Or would it make more sense to train with energies instead?

I was under the impression that the GN is equivariant because of this option in the input yaml file: equivariance_invariance_group: O(3).

AntonioMirarchi commented 8 months ago

Hi, few things there:

if you want to train only with forces --> y_weight=0.0 and neg_dy_weight=1.0
moreover you should take into account that TorchForce works with kj/mol for energy and nanometer for distance, and if these are not the units of measure for your NNP then you need a wrapper

RaulPPelaez commented 8 months ago

Thanks for the comparison image. Now its clear. To me it looks like either a units problem or just hparams. Probably the latter.

I was under the impression that the GN is equivariant because of this option in the input yaml file: equivariance_invariance_group: O(3).

It is not, this option is proper to Tensornet. GraphNet will just ignore this option. This should result in at least a warning, I'll give you that. I am scared of making this an error since it would break backwards compatibility. https://torchmd-net.readthedocs.io/en/latest/models.html#graph-network

What should the value of y_weight be? Or would it make more sense to train with energies instead?

If you are trying to train only on forces your yaml is fine, torchmd-net will ignore the value of y_weight when no energy_files is provided. I cannot give you much wisdom on whether you should train on energies besides saying that my current intuition is that forces+energies works best. However, choice of dataset is very important too.

If you are looking for a reproducer, you should be able to reproduce the results of the paper you linked using the yamls they provide (beyond some possible update to the name of some parameter). As a matter of fact I believe @AntonioMirarchi has done this before.

AntonioMirarchi commented 8 months ago

Yes I was able to reproduce those results

eva-not commented 8 months ago

Thanks for the advice. I'm pretty sure the units are correct.

I will first have a go at reproducing the CG simulations in OpenMM before attempting to run atomistic simulations. Closing this for now.

openmm / openmm-torch

Unphysical structures when using TorchForce as NNP #133