openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
305 stars 90 forks source link

Passing keyword arguments to underlying toolkits #1610

Open tlfobe opened 1 year ago

tlfobe commented 1 year ago

Is your feature request related to a problem? Please describe. I'm using the OpenFF-toolkit to parameterize some larger systems (300+ atoms), and I currently have a bootleg work around, where I hardcode the OpenEye assign_partial_charges method to have the keyword argument maxAtoms = 500, letting me parameterize larger systems. I can then install this local version to my conda environment, and things work. (yes these take forever to parameterize, usually I can get something to finish parameterizing in ~12-24 hours if I'm using the OpenEye implementation of AM1-BCC).

I was thinking it might be useful to other users to be able to pass keyword arguments to the underlying toolkits, so that in cases like this there is a way to access some of the options in those toolkits.

I could see where this might be problematic since not all the toolkits have the same keyword arguments, but maybe a keyword argument handler could be used here to elegantly pass the right keyword parameters to the underlying toolkit functions and throw a warning if keywords are not used.

Describe the solution you'd like Adding some instances where keyword arguments to underlying toolkit are passeable to some of the top-level funcitons would make the toolkit a little more flexible for non-standard uses. In my case the assign_partial_charges, but I think there are several instances where something like this might be desireable.

Initial Example

I have a crude working example here with changes to the Molecule.assign_partial_charges method:

def assign_partial_charges(
        self,
        partial_charge_method: str,
        strict_n_conformers: bool = False,
        use_conformers: Optional[Iterable[Quantity]] = None,
        toolkit_registry: TKR = GLOBAL_TOOLKIT_REGISTRY,
        normalize_partial_charges: bool = True,
        **kwargs
    ):
        """
        Calculate partial atomic charges and store them in the molecule.

        ``assign_partial_charges`` computes charges using the specified toolkit
        and assigns the new values to the ``partial_charges`` attribute.
        Supported charge methods vary from toolkit to toolkit, but some
        supported methods are:

        - ``"am1bcc"``
        - ``"am1bccelf10"`` (requires OpenEye Toolkits)
        - ``"am1-mulliken"``
        - ``"mmff94"``
        - ``"gasteiger"``

        By default, the conformers on the input molecule are not used
        in the charge calculation. Instead, any conformers needed for
        the charge calculation are generated by this method. If this
        behavior is undesired, specific conformers can be provided via the
        ``use_conformers`` argument.

        ELF10 methods will neither fail nor warn when fewer than the
        expected number of conformers could be generated, as many small
        molecules are too rigid to provide a large number of conformers. Note
        that only the ``"am1bccelf10"`` partial charge method uses ELF
        conformer selection; the ``"am1bcc"`` method only uses a single
        conformer. This may confuse users as the `ToolkitAM1BCC`_ SMIRNOFF
        tag in a force field file defines that AM1BCC-ELF10 should be used
        if the OpenEye Toolkits are available.

        For more supported charge methods and their details, see the
        corresponding methods in each toolkit wrapper:

        - :meth:`OpenEyeToolkitWrapper.assign_partial_charges \
          <openff.toolkit.utils.toolkits.OpenEyeToolkitWrapper.assign_partial_charges>`
        - :meth:`RDKitToolkitWrapper.assign_partial_charges \
          <openff.toolkit.utils.toolkits.RDKitToolkitWrapper.assign_partial_charges>`
        - :meth:`AmberToolsToolkitWrapper.assign_partial_charges \
          <openff.toolkit.utils.toolkits.AmberToolsToolkitWrapper.assign_partial_charges>`
        - :meth:`BuiltInToolkitWrapper.assign_partial_charges \
          <openff.toolkit.utils.toolkits.BuiltInToolkitWrapper.assign_partial_charges>`

        .. _ToolkitAM1BCC: https://openforcefield.github.io/standards/standards/smirnoff/\
            #toolkitam1bcc-temporary-support-for-toolkit-based-am1-bcc-partial-charges

        Parameters
        ----------
        partial_charge_method : string
            The partial charge calculation method to use for partial charge
            calculation.
        strict_n_conformers : bool, default=False
            Whether to raise an exception if an invalid number of conformers is
            provided for the given charge method. If this is False and an
            invalid number of conformers is found, a warning will be raised.
        use_conformers : Arrays with shape (n_atoms, 3) and dimensions of distance
            Coordinates to use for partial charge calculation. If ``None``, an
            appropriate number of conformers will be generated.
        toolkit_registry
            :class:`ToolkitRegistry` or :class:`ToolkitWrapper` to use for the
            calculation.
        normalize_partial_charges : bool, default=True
            Whether to offset partial charges so that they sum to the total
            formal charge of the molecule. This is used to prevent accumulation
            of rounding errors when the partial charge assignment method returns
            values at limited precision.

        Examples
        --------

        Generate AM1 Mulliken partial charges. Conformers for the AM1
        calculation are generated automatically:

        >>> molecule = Molecule.from_smiles('CCCCCC')
        >>> molecule.assign_partial_charges('am1-mulliken')

        To use pre-generated conformations, use the ``use_conformers`` argument:

        >>> molecule = Molecule.from_smiles('CCCCCC')
        >>> molecule.generate_conformers(n_conformers=1)
        >>> molecule.assign_partial_charges(
        ...     'am1-mulliken',
        ...     use_conformers=molecule.conformers
        ... )

        Raises
        ------
        InvalidToolkitRegistryError
            If an invalid object is passed as the toolkit_registry parameter

        See Also
        --------
        openff.toolkit.utils.toolkits.OpenEyeToolkitWrapper.assign_partial_charges
        openff.toolkit.utils.toolkits.RDKitToolkitWrapper.assign_partial_charges
        openff.toolkit.utils.toolkits.AmberToolsToolkitWrapper.assign_partial_charges
        openff.toolkit.utils.toolkits.BuiltInToolkitWrapper.assign_partial_charges
        """

        # Raise a warning when users try to apply these charge methods to "large" molecules
        WARN_LARGE_MOLECULES: Set[str] = {
            "am1bcc",
            "am1bccelf10",
            "am1-mulliken",
            "am1bccnosymspt",
            "am1elf10",
        }

        if partial_charge_method in WARN_LARGE_MOLECULES:
            if self.n_atoms > 150:
                warnings.warn(
                    f"Warning! Partial charge method '{partial_charge_method}' is not designed "
                    "for use on large (i.e. > 150 atoms) molecules and may crash or take hours to "
                    f"run on this molecule (found {self.n_atoms} atoms). For more, see "
                    "https://docs.openforcefield.org/projects/toolkit/en/stable/faq.html"
                    "#parameterizing-my-system-which-contains-a-large-molecule-is-taking-forever-whats-wrong",
                    stacklevel=2,
                )

        if isinstance(toolkit_registry, ToolkitRegistry):
            # We may need to try several toolkitwrappers to find one
            # that supports the desired partial charge method, so we
            # tell the ToolkitRegistry to continue trying ToolkitWrappers
            # if one raises an error (raise_exception_types=[])
            toolkit_registry.call(
                "assign_partial_charges",
                molecule=self,
                partial_charge_method=partial_charge_method,
                use_conformers=use_conformers,
                strict_n_conformers=strict_n_conformers,
                normalize_partial_charges=normalize_partial_charges,
                raise_exception_types=[],
                _cls=self.__class__,
                **kwargs,
            )
        elif isinstance(toolkit_registry, ToolkitWrapper):
            toolkit_wrapper: ToolkitWrapper = toolkit_registry
            toolkit_wrapper.assign_partial_charges(  # type: ignore[attr-defined]
                self,
                partial_charge_method=partial_charge_method,
                use_conformers=use_conformers,
                strict_n_conformers=strict_n_conformers,
                normalize_partial_charges=normalize_partial_charges,
                _cls=self.__class__,
                **kwargs,
            )
        else:
            raise InvalidToolkitRegistryError(
                f"Invalid toolkit_registry passed to assign_partial_charges."
                f"Expected ToolkitRegistry or ToolkitWrapper. Got  {type(toolkit_registry)}"
            )

Where I just pass a **kwargs to the assign_partial_charges method. I also changed the openeye_wrapper's version of the assign_partial_charges method to accept kwargs as well.

I'd be happy to work on a PR for this if it's of interest to the OpenFF community!

j-wags commented 1 year ago

Thanks for the great writeup, @tlfobe! I'd thought about doing something like this in #471 but got scared away by how complex the ToolkitWrapper and ToolkitRegistry interface already ended up being. I think I'm still generally against this change because of 1) the implementation cost, complexity, and maintenance burden it would add, 2) the importance of committing to a stable plugin interface for downstream devs, and 3) the likelihood that this would introduce silent errors.

1) the implementation cost, complexity, and maintenance burden it would add

As a recap, the call stack for something like offmol.assign_partial_charges is:

The first three calls are in our public API, so we must ensure that they have the right behavior both when called directly by the user and also when called by by higher methods in the stack.

Also, as ToolkitRegistry.call iterates through each ToolkitWrappers it contains...

(This is a bonkers system that I regret implementing, but it's in production now and it gets the job done, and it's frankly hard to think of a simple alternative. I keep meaning to get around to documenting this but I haven't had time to work on the developer docs in a few years :-( )

So to add support for arbitrary kwargs would require rethinking/modifying the function signature of several things in Molecule/ToolkitRegistry/ToolkitWrapper (or adding burdensome kwarg registries for all TKW methods), determining which exception types may be raised by TKWs for unrecognized kwargs (or by new failure modes triggered by the value of a kwarg), and how Molecule methods should react to each type of error (that is, which raise_exception_types value it should pass in).

So... it would be a lot of work for us to implement this correctly.

2) the importance of committing to a stable plugin interface for downstream devs

It wouldn't just take our time - All external folks who maintain ToolkitWrappers will also have the obligation put on them to implement a kwarg registry and raise the correct sorts of exceptions/have the right behaviors that Molecule.assign_partial_charges expects when it sets raise_exception_types.

Strategically, downstream developers are super important for OpenFF. We have very limited personal bandwidth to engage with other devs, so our long-term strategy needs to rely on passive incorporation of our tools. Our "contract" with downstream devs is the plugin interface, so to change that and put work on their shoulders is a big ask. Having a plugin interface that only has positional and named kwargs is a commitment on our part to not slip changes through without an easily-discoverable commit trail.

3) the likelihood that this would introduce silent errors

Method signatures ending in **kwargs have a bad smell to me - I've had several experiences where this leads to methods silently ignoring something important and giving users the wrong results. I think a correctly-implemented kwarg handler wouldn't have this problem, but bugs where a kwarg handler lets in something it shouldn't could land in the worst category of errors ("silent but deadly"). Matplotlib/pyplot come to mind for this (but at least those outputs are immediately visualized so users can spot errors!)

Overall

So, given that there are available workarounds[1] and that this would be a lot of work for us and other to implement and maintain, I probably wouldn't accept a PR on this. I do appreciate that you opened this issue to discuss it, though! I'll leave this open for discussion, and while I could imagine that we could find a design that resolves my concerns with items 2 and 3, the limited benefit doesn't seem like it'd be worth the work of overcoming item 1.

[1] instead of changing any source code, I think you could make a new ToolkitWrapper that has your desired changes to charge assignment, import it, run TLFobesToolkitWrapper.assign_partial_charges(offmol), and then do ff.create_openmm_system(top, charge_from_molecules=[offmol])