Expose properties from model other than forces and energies?

orionarcher commented 1 month ago

Some MLFF models expose properties other than the forces and energies. For example, CHGnet also predicts the magnitude of the magnetic moment. Is there a way to access these values through OpenMM or OpenMM-torch while the simulation is running?

peastman commented 1 month ago

The model you provide to OpenMM-Torch has energy as its only output, or possibly also forces if you request that. By construction there are no other outputs. You might have implemented the model using code that can also compute other quantities, but they aren't returned.

orionarcher commented 1 month ago

Thank you, good point. My question was a bit ill-phrased.

If the underlying torch.nn.Module exposes additional properties, like below, can I access the module once OpenMM is running? If not, is there a path to implementing that feature or is it out of scope?

I'd love to use OpenMM and for my use case having access to other model outputs will be important.

class ForceModule(torch.nn.Module):
    """A central harmonic potential that computes energy and writes my_property."""
    def forward(self, positions):

        self.my_property = torch.mean(positions)

        return torch.sum(positions**2)

peastman commented 1 month ago

Currently there's no feature to expose that. It's an interesting idea. Can you describe your use case and how you would want it to work? What would you imagine the API looking like?

orionarcher commented 1 month ago

I'm working on materials science applications where magnetic, dielectric, and electronic properties are often essential. Incorporating these properties directly into MLFFs is still new, CHGnet is an early example, but it's increasingly common. I'd like to be able to observe and record those properties as the trajectory evolves.

The core problems that I foresee are allowing for asynchronous IO and supporting custom outputs, which is necessary since we don't know what features the model will be predicting. There are a couple approaches I can imagine, these are just loose thoughts and could be mixed and matched.

1. Just leaving it to the user.

I could just write files in the forward pass of the ForceModule.

pros:

already done! cons:
forces synchronous IO, slowing things down

2. Saving additional attributes as properties of the ForceModule and exposing the ForceModule through the TorchForce with a getter method.

class ForceModule(torch.nn.Module):
    """A central harmonic potential that computes energy and writes my_property."""
    def forward(self, positions):

        self.my_property_list = self.my_property_list.append(torch.mean(positions))

        return torch.sum(positions**2)

simulation.step(100)
torch_force = system.getForce(0)
module = torch_force.getForceModule()
my_property = module.my_property_list
len(my_property_list) == 100

Instead of appending we could instead overwrite the property at each step but then we'd need to do something like this

my_property_list = []
for _ in range(100):
    simulation.step(10)
    torch_force = system.getForce(0)
    module = torch_force.getForceModule()
    my_property_list.append(module.my_property)

Pros:

very simple Cons:
stores data in memory, which could be prohibitive if saving per_atom information
harder to control how often property is appended
not very coherent with the rest of the OpenMM API

3. Exporting the properties in the forward pass and exposing them with a TorchReporter

class ForceModule(torch.nn.Module):
    """A central harmonic potential that computes energy and writes my_property."""
    def forward(self, positions):

        # should still support returning energy or force tensors instead of 
        # a dict for backwards compatibility / simplicity
        return {energy: torch.sum(positions ** 2), my_property: torch.mean(positions)}

class MyReporter(TorchReporter):

    def report(self, file, reportInterval):

        self.forward_pass_output =  # would need to somehow get access to 
                                    # the output of the forward pass

        # my custom IO code

This is just a loose idea, but having some way to asynchronously write the data would be nice. This would require some changes to the current TorchReporter API, namely allowing dicts in addition to torch tensors.

pros:

allows for asynchronous IO on the OpenMM side
better matches the OpenMM API cons:
more complicated
modifies API

peastman commented 1 month ago

The complication is that TorchForce is a C++ class. The Python class is just a thin wrapper around it. When you create a TorchForce, it serializes your module to a stream of bytes and reconstructs a new module on the C++ side. From that point on there's no longer any connection to the original module. It also has been compiled to TorchScript and has no access to the Python interpreter. It can only contain operations that are supported by TorchScript.

So we need to think of this in terms of the C++ API. For example, we might allow the module to return extra outputs, and add a computeOutputs(context) method that would compute and return all the extra outputs. We would need to figure out what form to return them in. A list of Tensors would be most obvious, but there could be complications in translating them between C++ and Python. Or maybe Numpy arrays.

orionarcher commented 1 month ago

Got it. I admit C++ is not my expertise.

In CHGnet, magnentic moments are calculated by the forward pass of the model so having a separate computeOuputs would mean extra model calls. It would be fastest to compute the outputs in the forward pass, store them in memory, and expose them through a checkOutputs call or something like that.

Returning a numpy array would make sense to me. OpenMM already returns numpy arrays elsewhere in it's API and it can be converted to a Tensor if needed. Well I personally don't see a use case for backpropagating through the additional property tensors. EDIT: on second thought, if it's possible to return the tensors, it's probably best not to throw away the derivative information.

Alternatively, additional outputs could be periodically written to an H5 file with some sort of Reporter.

falletta commented 2 weeks ago

In support of @orionarcher's request, I think it would be very beneficial to have the ability to access extra properties through OpenMM. Various dielectric properties can be predicted from a machine learning model trained on energy, forces, polarization, polarizability, and Born charges. For instance: • Polarization and polarizability during MD enable the study of vibrational and dielectric properties, such as infrared and Raman spectroscopy. • Born charges during MD can be used to perform dynamics under arbitrary time-dependent electric fields.

The inclusion of the electric-field contributions could be done directly in the OpenMM interface. Let U be the energy, E the electric field, P the polarization, α the polarizability, F the forces, e the electron charge, and Z the Born charges. The model is trained to predict quantities in the absence of the field, namely U(0), P(0), F(0), α, Z. Then, the field-dependent electronic structure is determined as follows: • Potential energy: U(E) = U(0) - E • P • Forces: F(E) = F(0) + e • Z • E • Polarization: P(E) = P(0) + α • E

It would be ideal to have the flexibility to define the electric field E in OpenMM at each time step of the MD simulation with an arbitrary time-dependent expression. This, for instance, can be used to study ferroelectric hysteresis using a sinusoidal electric field.

More can be found in this paper, which describes the ML model and the LAMMPS interface following this idea.

Having the flexibility in OpenMM to handle extra quantities goes beyond just the scope of dielectric and ferroelectric properties of materials; it could be applied to a variety of response functions to external perturbations. It would be a fantastic addition to the code, and I hope you will consider this point relevant and urgent.

peastman commented 2 weeks ago

Can you suggest what an API for that might look like?

What parts of the calculation would be done by OpenMM, what parts would be internal to the PyTorch model, and what parts would be done at a higher level external to both of them (such as a Python script)?

orionarcher commented 2 weeks ago

class ForceModule(torch.nn.Module):
    """Example of how a user would implement their model with additional properties"""

    def __init__(self, model: torch.nn.Module, electric_field: torch.Tensor):
        self.model = model
        self.electric_field = electric_field

    def forward(self, positions):
        # Calculate energy (required)
        forces, born_charges = self.model(positions)

        electric_forces = self.model.compute_forces_based_on_field(forces, born_charges, self.electric_field)

        total_forces = forces + electric_forces

        return total_forces

    def compute_outputs(
            self, positions, calculate_polarizability: bool,calculate_polarization: bool
            ) -> dict[str, torch.Tensor]:

        polarizability, polarization = self.model.calculate_electric_properties(
            positions, calculate_polarizability=calculate_polarizability, calculate_polarization=calculate_polarization
        )

        return {"polarizability": polarizability, "polarization": polarization}

from openmmtorch import computeOutputs

# 3. New Reporter for Property Recording
class TorchElectricReporter:
    """Reporter for recording model properties during simulation"""

    def __init__(self, file: str, reportInterval: int, calculate_polarization: bool):
        self.file = file
        self._reportInterval = reportInterval
        self.calculate_polarization = calculate_polarization

    def describeNextReport(self, simulation):
        """Returns information about the next report"""
        steps = self._reportInterval - simulation.currentStep % self.reportInterval
        return (steps, False, False, False, False, True)

    def report(self, simulation):
        """Records the specified properties to the output file"""

        outputs: dict[str, torch.Tensor] = computeOutputs(simulation.context, self.calculate_polarization)

        # user defined code for writing out the outputs to a file

What about something like the above?

The key API modifications are:

Add a computeOutputs method to the ForceModule that returns a dictionary of Tensors (or could be numpy arrays). It has keyword arguments so the user can specify what properties they'd like to calculate.
Add a computeOutputs function that takes in the simulation context, passes the positions to the ForceModule's computeOutputs method, and returns the results.

This allows us to define a new reporter that calls computeOutputs with the appropriate arguments and writes the results to a file. If the user wanted to calculate the polarizability and polarization at different frequencies, they could attach two reporters with different report intervals.

Tracking the born charges and the electric field can all be handled in the forward pass without any modification to the current API. The calculation of additional outputs is handled by compute_outputs and the values are exposed to the python API through computeOutputs.

falletta commented 2 weeks ago

Thanks for sharing this @orionarcher , some API like the one you suggested should work!

To your questions @peastman : • The PyTorch model contains energy, forces, polarization, polarizability and Born charges calculated at zero field. • The OpenMM interface should load these quantities and, when the electric field is nonzero, the interface should add the electric field contributions to energy, forces, and polarization. This operation involves only basic vector-vector and matrix-vector multiplications (see expressions in my previous message). • The electric field should be specified by the user, so it should be treated at a higher external level.

peastman commented 2 weeks ago

@orionarcher an API like what you suggest would be very easy to implement. In that design, OpenMM doesn't really do anything with the extra outputs except return them. It's entirely up to you to write your own code to do something with them. Is that sufficient for your needs?

@falletta it sounds like you're asking for something more than that. I'm not completely clear on the details.

The PyTorch model contains energy, forces, polarization, polarizability and Born charges calculated at zero field.

What does "contains" mean? Are they inputs to the model? Outputs from it? Tensors that are stored inside the model but can be modified? Something else? If you mean they're tensors, how would their values be set?

the interface should add the electric field contributions to energy, forces, and polarization.

What piece of code computes the contributions from them? Is it part of the PyTorch model, or the TorchForce class? If the latter, what is the reason for not doing it inside the model? How would you tell TorchForce what calculations to do?

The electric field should be specified by the user, so it should be treated at a higher external level.

Does that mean the electric field is an input to the model? How would the user specify it?

orionarcher commented 2 weeks ago

That would meet my needs. I think it's fine to ask the user to perform additional operations as long as the necessary information can be exposed. The main downside is that any IO in the TorchElectricReporter.report class is running serially with the simulation, but as long as the IO is fast, that shouldn't be an issue. The MDAnalysis OpenMM reporter is structured in the same way, which I am drawing on.

openmm / openmm-torch