mosdef-hub / mbuild

A hierarchical, component based molecule builder
https://mbuild.mosdef.org
Other
171 stars 79 forks source link

Periodic polymer #1170

Closed jaclark5 closed 4 months ago

jaclark5 commented 5 months ago

PR Summary:

Resolves #1171

PR Checklist


codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 87.35%. Comparing base (2bb047e) to head (e5a1364). Report is 2 commits behind head on main.

:exclamation: Current head e5a1364 differs from pull request most recent head a36afc7. Consider uploading reports for the commit a36afc7 to get more accurate results

Files Patch % Lines
mbuild/lib/recipes/polymer.py 92.85% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1170 +/- ## ========================================== + Coverage 87.32% 87.35% +0.02% ========================================== Files 62 62 Lines 6525 6540 +15 ========================================== + Hits 5698 5713 +15 Misses 827 827 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

chrisjonesBSU commented 5 months ago

@jaclark5 Funnily enough, just last week a co-worker and I were discussing how we could do this to help with a problem we are facing, so thank you for getting this started!

I'm reading through the lines 202 to 221 where the error is raised. It seems like the goal here is to align the chain backbone along the z-axis so that this periodic bond is roughly <0, 0, 1>, and the periodicity of the compound is [False, False, True], is that correct?

This seems to work great with the CH2() class since the atoms are conveniently set up perfectly, but issues come up with monomers that aren't as curated.

For example, if I try using SMILES to create a polyethylene chain:

import mbuild as mb
from mbuild.lib.recipes import Polymer

mon = mb.load("CC", smiles=True)
poly = Polymer()
poly.add_monomer(compound=mon, indices=[2, 6], separation=0.15)
poly.build(n=10, add_hydrogens=False, make_periodic=True)

then this line:

212             self.periodic = list(
213                 ~np.isclose(
214                     head_port.pos - tail_port.pos, 0.0, np.finfo(float).eps
215                 )
216             )

returns [True, True, True] and the error is raised.

If z-axis alignment is going to be enforced, maybe we should call the z_axis_transform method first (which is in mbuild.coordinate_transform):

        if make_periodic:
            if np.any([x is not None for x in self._end_groups]):
                raise ValueError(
                    "Keyword 'make_periodic' cannot be defined if 'end_groups' are provided."
                )
            z_axis_transform(
                    compound=self,
                    point_on_z_axis=head_port.pos,
                    point_on_zx_plane=tail_port.pos
            )

With the polyethylene example above,

head_port.pos - tail_port.pos is array([-2.03379188, -1.14121417, 0.8149497 ]) without calling the z_axis_transform but if we do call it, then it becomes array([ 3.83137532e-13, -5.59901469e-05, 2.47038908e+00]) which is a lot closer to [0, 0, 1] if we allow for some tolerance.

So, for the sake of being more generally usable (especially with monomers created from SMILES), I think we should 1) call the z_axis_transform method under the if statement first and 2) add a parameter that allows one to set atol in the np.isclose method, maybe we add it as a prameter to build() call it something like z_axis_tol

Let me know what you think, or if this causes any other issues that I'm missing.

jaclark5 commented 5 months ago

Hey @chrisjonesBSU that sounds good to me! Do you want to add that as a contributor to this PR, or I can add it in.

I think this change will provide a more robust function, but I don't think that's the overall problem. I've been using this with a LJ system which is easy to align perfectly, but it's throwing an error because the x and y coordinates aren't perfect, despite being within 0.001 Å. I could add a tolerance keyword argument, but I'll just make it relative to the longest bond length in the system.

jaclark5 commented 5 months ago

I think it should be a method, but that it should still be a keyword argument in build() for two reasons: 1) All current methods of the Polymer recipe are also inputs for build 2) With add_hydrogens and end_groups as kwargs in build() there is a precedent for including capping options in that method, and honestly as a user I didn't know about the other methods in Polymer until recently because all I needed was build(), so why remove another capping option away from a user's fingertips?

CalCraven commented 5 months ago

I think it should be a method, but that it should still be a keyword argument in build() for two reasons:

  1. All current methods of the Polymer recipe are also inputs for build I'm not sure about this. Methods that are not arguments to build: add_monomer, add_end_groups are the two main methods to the class, which are not part of the build method. The build arguments are build(self, n, sequence="A", add_hydrogens=True), none of which are methods in the class.

  2. With add_hydrogens and end_groups as kwargs in build() there is a precedent for including capping options in that method, and honestly as a user I didn't know about the other methods in Polymer until recently because all I needed was build(), so why remove another capping option away from a user's fingertips? Hmmm, I think this is an interesting argument. I guess I'm less worried about precedent as a whole, and more worried about functionality for the code. The things I'm considering are:

    • How often used are the two functions in conjunction as opposed to separately?

      While the wrapping is useful and cool, I would say at least 90% or greater of the time, people are just using these functions to bond monomers together.

    • How independently can the two functions be implemented?

      Should be pretty easy, there's nothing core about one function that needs to influence the other.

    • What are the purposes of the sections of code?

      build(): Just to recursively bond together monomer groups into your pattern and replicates periodic_wrap(): Take the end groups and bond them across a periodic boundary while aligning the group

    • What is the amount of information we're providing the user

      I think the build function is pretty clear and intuitive. I think the periodic wrap is a bit of extra complexity. Let's move the whole molecule with some complex transformation. Let's add this bond to make a periodic molecule in one dimension that needs to be handled carefully when setting the simulation box.

I think with these answers in mind, and the Zen of python echoing that explicit is better than implicit, this should be a separate method to call explicitly.

chrisjonesBSU commented 5 months ago

I am learning towards agreeing with @jaclark5.

First, the periodic bond doesn't necessarily need to be handled carefully. The bonds and angles being off initially is fixed easily by a quick energy minimization (e.g. short simulation after finishing mbuild -> gmso -> simulation engine). But yeah, I also agree that the self.periodicity attribute doesn't need to be set or changed here.

Second, the information needed to form this bond is already conveniently available during the build step and is lost after building up the polymer is complete. If I create a 20mer from a monomer, I don't know what 2 particles (or ports) to pass into force_overlap after the fact without some extra work.

Also, I agree that this process isn't that different in principle than 1) Placing hydrogen atoms back onto the head and tail monomers out of convenience and 2) adding end-groups monomers automatically after building up the polymer.

chrisjonesBSU commented 5 months ago

Ahh, I see you addressed the second point in an earlier comment @CalCraven

May also need to adjust some port removal at the end of the build stage, since it seems like it wipes all ports. Either that, or this align_periodic_bond method takes the particles to create the bonds between as the argument, and if it isn't passed just takes the first and last children in the list.

Either way, I think that we should make it easier to automatically bond the head and tail monomers. We shouldn't have more indices/particles/ports parameters that need to be manually defined later. Putting this in build seems fine IMO.

If we are going to have a separate function to both orient a polymer chain along an axis and form a head-tail-bond, then maybe the head and tail ports should be stored as properties or attributes of the Polymer class? I think keeping these around and making them more accessible would be beneficial. Having these as attributes would also make using the coordiante_transform methods easier. Aligning long polymer chains along a specific direction can be difficult without choosing the right particles, and is useful even in cases where you aren't needing to form a head-tail bond.

CalCraven commented 5 months ago

Either way, I think that we should make it easier to automatically bond the head and tail monomers. We shouldn't have more indices/particles/ports parameters that need to be manually defined later. Putting this in build seems fine IMO.

Agreed, the indices stuff can get confusing since we don't provide a great way to visualize which particles have which indices other than print with enumeration.

If we are going to have a separate function to both orient a polymer chain along an axis and form a head-tail-bond, then maybe the head and tail ports should be stored as properties or attributes of the Polymer class? I think keeping these around and making them more accessible would be beneficial. Having these as attributes would also make using the coordiante_transform methods easier. Aligning long polymer chains along a specific direction can be difficult without choosing the right particles, and is useful even in cases where you aren't needing to form a head-tail bond.

I think that's a great idea, that would be a useful feature to have for other Polymer based methods in general, such as some sort of crosslinking algorithm. I'm also thinking if you wanted to make a polymer with more ports such that you could form other bonds later, just bllindly removing all the ports maybe should be adjusted.

CalCraven commented 5 months ago

I am learning towards agreeing with @jaclark5.

First, the periodic bond doesn't necessarily need to be handled carefully. The bonds and angles being off initially is fixed easily by a quick energy minimization (e.g. short simulation after finishing mbuild -> gmso -> simulation engine). But yeah, I also agree that the self.periodicity attribute doesn't need to be set or changed here.

Second, the information needed to form this bond is already conveniently available during the build step and is lost after building up the polymer is complete. If I create a 20mer from a monomer, I don't know what 2 particles (or ports) to pass into force_overlap after the fact without some extra work.

Also, I agree that this process isn't that different in principle than 1) Placing hydrogen atoms back onto the head and tail monomers out of convenience and 2) adding end-groups monomers automatically after building up the polymer.

Yeah these are good points. I am surprised that the energy minimization works at all with a molecule with this high of energy. I think that's what I'm most worried about, adding a few hydrogens should work in almost all cases. However, creating this bond seems to me like it would cause lots of issues in a simulation workflow if you didn't really know what you were doing.

chrisjonesBSU commented 5 months ago

If we go in the direction of having properties for head and tail ports, then I think the workflow of having create_periodic_bond() outside of build() becomes more approachable:

polymer = Polymer(...)
polymer.build(n=5)
polymer.create_periodic_bond(axis="z")

as opposed to

polymer = Polymer(...)
polymer.build(n=5, periodic_axis="z")

One thing that could come up is if add_hydrogens=True is set in build and create_periodic_bond is called later.

I think there are multiple ways to handle this. We could throw and error in create_periodic_bond() and the user has to re-call the build() with add_hydrogens=False. Or, even if the head and tail monomers were capped with hydrogens, I think we can find them easily enough and remove them in create_periodic_bond() using self.head_port.anchor.

Let us know what you think @jaclark5

CalCraven commented 5 months ago

Agreed, I think you could still add the bond even if the polymer is capped with hydrogens. I think the way to do that would be to add an argument remove_caps=True. If this is true, make sure to remove a hydrogen from the anchors of the head_port and tail_port.

head = head_port.anchor
bonded_atoms = list(nx.neighbors(self.root.bond_graph, head))
cappedBool = False
for atom in bonded_atoms:
    if atom.name == "H":
        self.remove(atom)
        raise UserWarning(f"Removing {atom} because remove_caps was set to True")
        cappedBool = True
        break
if not cappedBool and remove_caps:
     raise ValueError(f"Could not find Hydrogen on {head =} or {tail =} to remove. Consider setting remove_caps=False")
CalCraven commented 4 months ago

Code coverage should pass if you just add a loop to check the axis arguments y and z to go along with the x test that is already being performed.

chrisjonesBSU commented 4 months ago

I think this is ready to merge. It could use a couple more tests in test_polymers. For example, ensure an error is raised when someone builds with end groups or add_hydrogens = True and then calls create_periodic_bond, and like Cal suggested, try passing in all of x, y z to create_periodic_bond

Thank you @jaclark5, this is a really helpful addition!

CalCraven commented 4 months ago

Once this is updated to the main branch, we can merge!