Help with strategy for searching alkyl chain conformers

I'd like to ask for suggestions on how to improve my workflow, because at the moment I have to do too much manual work which I would like to automatize.

Task

I want to create a set of conformers of chain-like organic molecules (esters) and save them as input Cartesians. Starting with propyl butanoate I want to systematically enlarge the alkyl chains.

Strategy so far

ttttt.txt Using an all-trans conformer as a starting point (all dihedrals = 180°), I want to change the value of the dihedrals of interest (5 in total) to 60°, 180° or 300°, respectively. For that I load the input Cartesian and set up an own zmatrix in order to know which dihedral I have to change:

ester = cc.Cartesian.read_xyz('ttttt.txt')
ester_frag = ester.fragmentate()
ester_c_table = pd.DataFrame([['origin', 'e_z', 'e_x'], [0, 'e_z', 'e_x'], [1, 0, 'e_x'], [2, 1, 0], [3, 2, 1], [4, 3, 2], [6, 4, 3], [7, 6, 4], [4, 3, 6], [8, 7, 6], [8, 7, 6], [8, 7, 6], [7, 6, 8], [7, 6, 8], [6, 4, 7], [6, 4, 7], [0, 1, 2], [0, 1, 2], [0, 1, 2], [1, 0, 2], [1, 0, 2], [2, 1, 3], [2, 1, 3]], columns=['b', 'a', 'd'], index=[0, 1, 2, 3, 4, 6, 7, 8, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])
ester_const_table = ester.get_construction_table(fragment_list=[(ester_frag[0], ester_c_table)])
zester = ester.get_zmat(ester_const_table)

Now the "backbone" dihedrals are replaced by symbols:

```sympy.init_printing() a, b, c, d, e = sympy.symbols('a, b, c, d, e') symb_zester = zester.copy() symb_zester.safe_loc[3, 'dihedral'] = a symb_zester.safe_loc[4, 'dihedral'] = b symb_zester.safe_loc[6, 'dihedral'] = c symb_zester.safe_loc[7, 'dihedral'] = d symb_zester.safe_loc[8, 'dihedral'] = e ```

Next a list of names for each conformer is created, which is used to generate the torsional angles afterwards. Enantiomeric pairs are deleted.

``` confs = {'t': 180, 'g+': 60, 'g-': 300} name_list = [['t']*5] for i in list(confs.keys()): for j in list(confs.keys()): for k in list(confs.keys()): for l in list(confs.keys()): name_list.append([i, j, 't', k, l]) for i in range(len(name_list)-1): name_list[i] = ['u' if x=='g+' else x for x in name_list[i]] name_list[i] = ['g+' if x=='g-' else x for x in name_list[i]] name_list[i] = ['g-' if x=='u' else x for x in name_list[i]] if (name_list[i] in name_list[(i+1):]): name_list[i] = [] name_list2 = [x for x in name_list if x] ```

Depending on the letter code the torsional angle is substituted and saved as a Cartesian input file.

``` for i in range(len(name_list2)): ester_tmp = symb_zester.subs(a, confs[name_list2[i][0]]).subs(b, confs[name_list2[i][1]]).subs(c, confs[name_list2[i][2]]).subs(d, confs[name_list2[i][3]]).subs(e, confs[name_list2[i][4]]).get_cartesian() cc.Cartesian.to_xyz(ester_tmp, str(''.join(name_list2[i]) + '.xyz')) del ester_tmp ```

Problem description

If I used a completely automatically generated zmatrix I wouldn't see any possibility to predict which number the torsional angles of interest will have. Therefore I need to set up the first few atoms of the zmatrix manually. In order to do so I need to visualize the molecule, realize which C atom is connected to which and properly define the connectivity. However, when at first I let the hydrogens be defined automatically, I realized that not all of them were moved correctly together with their corresponding C atoms, because their dihedrals had unfortunate assignments. The result was, that some of them crashed into the carbons. Therefore I had to explicitly define the connectivity of each and every H atom. Now this manual work is very tedious and very badly scalable for larger molecules. It would be great if it was possible to make my attempt more generally applicable. I would be very glad for any suggestions for improvement!

Your code can be written a lot more cleaner:

Make more vectorised calls. (There was actually a bug in the subs method that did not allow a vectorised call, so please make a git pull before executing the following code).
You only have to define the construction table up to the point of interest. The rest is done automatically.
If fragmentate gives a one element list, it is the unfragmented molecule itself. So just don't fragmentate in this case and use the molecule directly.

import chemcoord as cc
from itertools import product

ester = cc.Cartesian.read_xyz('ttttt.xyz')
ester_c_table = pd.DataFrame([['origin', 'e_z', 'e_x'],
                              [0, 'e_z', 'e_x'],
                              [1, 0, 'e_x'],
                              [2, 1, 0],
                              [3, 2, 1],
                              [4, 3, 2],
                              [6, 4, 3],
                              [7, 6, 4]],
                             columns=['b', 'a', 'd'], index=[0, 1, 2, 3, 4, 6, 7, 8])
ester_const_table = ester.get_construction_table(fragment_list=[(ester, ester_c_table)])
zester = ester.get_zmat(ester_const_table)

symbols = sympy.symbols('a, b, c, d, e')
symb_zester = zester.copy()
symb_zester.safe_loc[[3, 4, 6, 7, 8], 'dihedral'] = symbols

angles = {'t': 180, 'g+': 60, 'g-': 300}
convert = {'g+': 'g-', 'u': 'g-', 'g-': 'g+'}

name_list = [[i, j, 't', k, l] for i, j, k, l in product(*[angles.keys() for _ in range(4)])]

for i, names in enumerate(name_list[:-1]):
    name_list[i] = [convert.get(x, x) for x in names]
    if (name_list[i] in name_list[(i + 1):]):
        name_list[i] = []
name_list = [x for x in name_list if x]

for names in (name_list):
    substitutions = list(zip(symbols, [angles[v] for v in names]))
    cc.Cartesian.to_xyz(symb_zester.subs(substitutions).get_cartesian(),
                        str(''.join(names) + '.xyz'))

For your other question: No there is not a finished function in chemcoord to automatically use one specific carbon atom for the construction table that is important for a specific chemical problem lateron. That is not a problem of implementation but even a human can't construct a zmatrix for a specific problem, if the problem is unknown at definition time.

What you can do:

Write your own function for defining the construction table that is streamlined for your problem.
If this works well, create a Class that inherits from Cartesian and overwrite Cartesian.get_construction_table with your own function. This gives you a Class streamlined for your problem.
I won't accept changes into this general library, that are written for a particular problem. But if you write generally useful helper functions in step 1 and 2 like Cartesian.get_functional_groups('ester') I will happily accept them in a Pull Request.

Thank you very much for cleaning up my code! At first I tried to construct the zmatrix only up to the point of interest, however, as I've written above, this gave birth to an unexpected behaviour: Some of the dihedral angles of the hydrogens were defined in such a way, that they didn't follow the movement/rotation of the carbon atom they're attached to but stayed were they are. The result was a "broken" molecule. This behaviour can be circumvented if it is made sure, that the reference point of the hydrogens are the neighbouring C atoms. A small example to demonstrate what I mean:

Let's take butane C1-C2-C3-C4 as a substitute for my specific ester problem. In the automatically generated zmatrix the reference of the two H on C3 was "bond to C3, angle to C2, dihedral to C1". If now one rotates around the C2-C3 bond (dihedral C1C2C3C4), these H won't move. However, if the references of the two H on C3 are changed to "bond to C3, angle to C2, dihedral to C4" instead, they will move properly.

About your suggestions: Do you have an idea for a starting point for the own function?

Ah sorry, now I got the point. The problem is that "bond to C3, angle to C2, dihedral to C1" is in my opinion the a priori better (more chemical) selection of references. At least I would build up Zmatrices this way. So this is definetely not a bug in the existing implementation and the generic get_construction_table won't change.

BUT: If you find a selection of references, where you have the feeling that this is the generically wrong selection please let me know.

Tips for writing your function:

Use testing (Even better, write tests before you start coding)! https://docs.pytest.org/en/latest/. You can have a look at my testing suite, how to set it up.
Just work on the graph defined by the connectivity table given from get_bonds and don't use the cartesian space itself.
All methods dealing with finding the construction table and constructing the zmatrix are here I recommend to have a look there.
The method that is most important for you is _get_frag_constr_table. This function assumes, that the argument molecule can not be fragmented. The get_construction_table lateron calls _get_frag_constr_table on each fragment and puts the pieces together. So if you have just one molecule, get_construction_table(...) is basically a call to _get_frag_constr_table.
Have a look at https://pypi.python.org/pypi/sortedcontainers and Cartesian._give_val_sorted_bond_dict. This returns a connectivity table as dict of sorted sets, sorted by valency. Which came in quite handy throughout writing.

mcocdawc / chemcoord