mcocdawc / chemcoord

A python module for manipulating cartesian and internal coordinates.
GNU Lesser General Public License v3.0
72 stars 19 forks source link

Help with strategy for searching alkyl chain conformers #48

Closed Eldrad-Ulthran closed 7 years ago

Eldrad-Ulthran commented 7 years ago

I'd like to ask for suggestions on how to improve my workflow, because at the moment I have to do too much manual work which I would like to automatize.

Task

I want to create a set of conformers of chain-like organic molecules (esters) and save them as input Cartesians. Starting with propyl butanoate I want to systematically enlarge the alkyl chains.

Strategy so far

ttttt.txt Using an all-trans conformer as a starting point (all dihedrals = 180°), I want to change the value of the dihedrals of interest (5 in total) to 60°, 180° or 300°, respectively. For that I load the input Cartesian and set up an own zmatrix in order to know which dihedral I have to change:

ester = cc.Cartesian.read_xyz('ttttt.txt')
ester_frag = ester.fragmentate()
ester_c_table = pd.DataFrame([['origin', 'e_z', 'e_x'], [0, 'e_z', 'e_x'], [1, 0, 'e_x'], [2, 1, 0], [3, 2, 1], [4, 3, 2], [6, 4, 3], [7, 6, 4], [4, 3, 6], [8, 7, 6], [8, 7, 6], [8, 7, 6], [7, 6, 8], [7, 6, 8], [6, 4, 7], [6, 4, 7], [0, 1, 2], [0, 1, 2], [0, 1, 2], [1, 0, 2], [1, 0, 2], [2, 1, 3], [2, 1, 3]], columns=['b', 'a', 'd'], index=[0, 1, 2, 3, 4, 6, 7, 8, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])
ester_const_table = ester.get_construction_table(fragment_list=[(ester_frag[0], ester_c_table)])
zester = ester.get_zmat(ester_const_table)

Now the "backbone" dihedrals are replaced by symbols:

```sympy.init_printing() a, b, c, d, e = sympy.symbols('a, b, c, d, e') symb_zester = zester.copy() symb_zester.safe_loc[3, 'dihedral'] = a symb_zester.safe_loc[4, 'dihedral'] = b symb_zester.safe_loc[6, 'dihedral'] = c symb_zester.safe_loc[7, 'dihedral'] = d symb_zester.safe_loc[8, 'dihedral'] = e ```

Next a list of names for each conformer is created, which is used to generate the torsional angles afterwards. Enantiomeric pairs are deleted.

``` confs = {'t': 180, 'g+': 60, 'g-': 300} name_list = [['t']*5] for i in list(confs.keys()): for j in list(confs.keys()): for k in list(confs.keys()): for l in list(confs.keys()): name_list.append([i, j, 't', k, l]) for i in range(len(name_list)-1): name_list[i] = ['u' if x=='g+' else x for x in name_list[i]] name_list[i] = ['g+' if x=='g-' else x for x in name_list[i]] name_list[i] = ['g-' if x=='u' else x for x in name_list[i]] if (name_list[i] in name_list[(i+1):]): name_list[i] = [] name_list2 = [x for x in name_list if x] ```

Depending on the letter code the torsional angle is substituted and saved as a Cartesian input file.

``` for i in range(len(name_list2)): ester_tmp = symb_zester.subs(a, confs[name_list2[i][0]]).subs(b, confs[name_list2[i][1]]).subs(c, confs[name_list2[i][2]]).subs(d, confs[name_list2[i][3]]).subs(e, confs[name_list2[i][4]]).get_cartesian() cc.Cartesian.to_xyz(ester_tmp, str(''.join(name_list2[i]) + '.xyz')) del ester_tmp ```

Problem description

If I used a completely automatically generated zmatrix I wouldn't see any possibility to predict which number the torsional angles of interest will have. Therefore I need to set up the first few atoms of the zmatrix manually. In order to do so I need to visualize the molecule, realize which C atom is connected to which and properly define the connectivity. However, when at first I let the hydrogens be defined automatically, I realized that not all of them were moved correctly together with their corresponding C atoms, because their dihedrals had unfortunate assignments. The result was, that some of them crashed into the carbons. Therefore I had to explicitly define the connectivity of each and every H atom. Now this manual work is very tedious and very badly scalable for larger molecules. It would be great if it was possible to make my attempt more generally applicable. I would be very glad for any suggestions for improvement!

mcocdawc commented 7 years ago

Your code can be written a lot more cleaner:

  1. Make more vectorised calls. (There was actually a bug in the subs method that did not allow a vectorised call, so please make a git pull before executing the following code).
  2. You only have to define the construction table up to the point of interest. The rest is done automatically.
  3. If fragmentate gives a one element list, it is the unfragmented molecule itself. So just don't fragmentate in this case and use the molecule directly.
import chemcoord as cc
from itertools import product

ester = cc.Cartesian.read_xyz('ttttt.xyz')
ester_c_table = pd.DataFrame([['origin', 'e_z', 'e_x'],
                              [0, 'e_z', 'e_x'],
                              [1, 0, 'e_x'],
                              [2, 1, 0],
                              [3, 2, 1],
                              [4, 3, 2],
                              [6, 4, 3],
                              [7, 6, 4]],
                             columns=['b', 'a', 'd'], index=[0, 1, 2, 3, 4, 6, 7, 8])
ester_const_table = ester.get_construction_table(fragment_list=[(ester, ester_c_table)])
zester = ester.get_zmat(ester_const_table)

symbols = sympy.symbols('a, b, c, d, e')
symb_zester = zester.copy()
symb_zester.safe_loc[[3, 4, 6, 7, 8], 'dihedral'] = symbols

angles = {'t': 180, 'g+': 60, 'g-': 300}
convert = {'g+': 'g-', 'u': 'g-', 'g-': 'g+'}

name_list = [[i, j, 't', k, l] for i, j, k, l in product(*[angles.keys() for _ in range(4)])]

for i, names in enumerate(name_list[:-1]):
    name_list[i] = [convert.get(x, x) for x in names]
    if (name_list[i] in name_list[(i + 1):]):
        name_list[i] = []
name_list = [x for x in name_list if x]

for names in (name_list):
    substitutions = list(zip(symbols, [angles[v] for v in names]))
    cc.Cartesian.to_xyz(symb_zester.subs(substitutions).get_cartesian(),
                        str(''.join(names) + '.xyz'))

For your other question: No there is not a finished function in chemcoord to automatically use one specific carbon atom for the construction table that is important for a specific chemical problem lateron. That is not a problem of implementation but even a human can't construct a zmatrix for a specific problem, if the problem is unknown at definition time.

What you can do:

  1. Write your own function for defining the construction table that is streamlined for your problem.
  2. If this works well, create a Class that inherits from Cartesian and overwrite Cartesian.get_construction_table with your own function. This gives you a Class streamlined for your problem.
  3. I won't accept changes into this general library, that are written for a particular problem. But if you write generally useful helper functions in step 1 and 2 like Cartesian.get_functional_groups('ester') I will happily accept them in a Pull Request.
Eldrad-Ulthran commented 7 years ago

Thank you very much for cleaning up my code! At first I tried to construct the zmatrix only up to the point of interest, however, as I've written above, this gave birth to an unexpected behaviour: Some of the dihedral angles of the hydrogens were defined in such a way, that they didn't follow the movement/rotation of the carbon atom they're attached to but stayed were they are. The result was a "broken" molecule. This behaviour can be circumvented if it is made sure, that the reference point of the hydrogens are the neighbouring C atoms. A small example to demonstrate what I mean:

Let's take butane C1-C2-C3-C4 as a substitute for my specific ester problem. In the automatically generated zmatrix the reference of the two H on C3 was "bond to C3, angle to C2, dihedral to C1". If now one rotates around the C2-C3 bond (dihedral C1C2C3C4), these H won't move. However, if the references of the two H on C3 are changed to "bond to C3, angle to C2, dihedral to C4" instead, they will move properly.

About your suggestions: Do you have an idea for a starting point for the own function?

mcocdawc commented 7 years ago

Ah sorry, now I got the point. The problem is that "bond to C3, angle to C2, dihedral to C1" is in my opinion the a priori better (more chemical) selection of references. At least I would build up Zmatrices this way. So this is definetely not a bug in the existing implementation and the generic get_construction_table won't change.

BUT: If you find a selection of references, where you have the feeling that this is the generically wrong selection please let me know.

Tips for writing your function:

  1. Use testing (Even better, write tests before you start coding)! https://docs.pytest.org/en/latest/. You can have a look at my testing suite, how to set it up.
  2. Just work on the graph defined by the connectivity table given from get_bonds and don't use the cartesian space itself.
  3. All methods dealing with finding the construction table and constructing the zmatrix are here I recommend to have a look there.
  4. The method that is most important for you is _get_frag_constr_table. This function assumes, that the argument molecule can not be fragmented. The get_construction_table lateron calls _get_frag_constr_table on each fragment and puts the pieces together. So if you have just one molecule, get_construction_table(...) is basically a call to _get_frag_constr_table.
  5. Have a look at https://pypi.python.org/pypi/sortedcontainers and Cartesian._give_val_sorted_bond_dict. This returns a connectivity table as dict of sorted sets, sorted by valency. Which came in quite handy throughout writing.