mammasmias / IterativeRotationsAssignments

https://mammasmias.github.io/IterativeRotationsAssignments/
Other
8 stars 2 forks source link

Probleme when using Python API "ira.match" for different size systemes. #6

Open Neo-Teyrall opened 3 months ago

Neo-Teyrall commented 3 months ago

Hi,

I try to used the algorithm IRA, to compare systemes which is composed of different number atomes.

r, t, p, hd = ira.match(n_i, atom_type_i, positions_i, n_j, ira.matchatom_type_j, positions_j, kmax_factor) with $i$ a systemes made of 60 atomes and $j$ a systemes made of 61 atomes. I gave the folloing error

At line 382 of file /home/wmargerit/Applications/IterativeRotationsAssignments/src//cshda.f90
Fortran runtime error: Index '62' of dimension 1 of array 'found' above upper bound of 61

I do not understand, where i made is my mitake. Is the ira.match not the good methode to used?

The parameters i use.

n_i = 60
atomtype_i = [ 7  6  6  8  7  6  6  6  6  6  8  7  6  6  8  8  7  6  6  8  7  6  6  6
               6  6  8  7  6  6  8  8 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
              29 29 29 29 29 29 29 29 29 29 29 29]
positions_i = [
 [ 0.7    3.169  2.51 ]
 [ 0.379  3.815  3.767]
 [-0.444  5.076  3.541]
 [-0.747  5.427  2.403]
 [-0.806  5.756  4.631]
 [-0.731  5.252  6.03 ]
 [-1.73   6.138  6.722]
 [-1.688  7.446  6.005]
 [-1.403  7.076  4.548]
 [-0.45   8.066  3.896]
 [-0.883  9.055  3.307]
 [ 0.854  7.8    4.001]
 [ 1.85   8.538  3.252]
 [ 2.813  7.597  2.545]
 [ 2.542  6.404  2.422]
 [ 3.875  8.017  2.089]
 [10.877  3.169  2.51 ]
 [10.556  3.815  3.767]
 [ 9.733  5.076  3.541]
 [ 9.43   5.427  2.403]
 [ 9.371  5.756  4.631]
 [ 9.446  5.252  6.03 ]
 [ 8.447  6.138  6.722]
 [ 8.489  7.446  6.005]
 [ 8.774  7.076  4.548]
 [ 9.727  8.066  3.896]
 [ 9.294  9.055  3.307]
 [11.031  7.8    4.001]
 [12.027  8.538  3.252]
 [12.99   7.597  2.545]
 [12.719  6.404  2.422]
 [14.052  8.017  2.089]
 [-1.276  2.211  0.   ]
 [ 1.276  2.211  0.   ]
 [ 8.934  2.211  0.   ]
 [11.487  2.211  0.   ]
 [-2.553  4.421  0.   ]
 [ 0.     4.421  0.   ]
 [ 2.553  4.421  0.   ]
 [ 7.658  4.421  0.   ]
 [10.211  4.421  0.   ]
 [12.763  4.421  0.   ]
 [-3.829  6.632  0.   ]
 [-1.276  6.632  0.   ]
 [ 1.276  6.632  0.   ]
 [ 3.829  6.632  0.   ]
 [ 6.382  6.632  0.   ]
 [ 8.934  6.632  0.   ]
 [11.487  6.632  0.   ]
 [14.04   6.632  0.   ]
 [-2.553  8.843  0.   ]
 [ 0.     8.843  0.   ]
 [ 2.553  8.843  0.   ]
 [ 5.105  8.843  0.   ]
 [ 7.658  8.843  0.   ]
 [10.211  8.843  0.   ]
 [12.763  8.843  0.   ]
 [15.316  8.843  0.   ]
 [-1.276 11.053  0.   ]
 [ 8.934 11.053  0.   ]]
n_j = 61
atomtype_j = [ 7  6  6  8  7  6  6  6  6  6  8  7  6  6  8  8  7  6  6  8  7  6  6  6
               6  6  8  7  6  6  8  8 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
              29 29 29 29 29 29 29 29 29 29 29 29 29]
positions_j = [
 [ 1.551  3.682  2.545]
 [ 0.91   4.165  3.751]
 [-0.113  5.246  3.433]
 [-0.354  5.553  2.268]
 [-0.717  5.822  4.475]
 [-0.833  5.214  5.829]
 [-2.119  5.82   6.321]
 [-2.203  7.173  5.696]
 [-1.547  7.003  4.324]
 [-0.708  8.216  3.951]
 [-1.087  8.992  3.075]
 [ 0.436  8.38   4.619]
 [ 1.421  9.364  4.22 ]
 [ 2.685  8.695  3.701]
 [ 2.658  7.531  3.307]
 [ 3.749  9.31   3.67 ]
 [11.063  2.45   3.081]
 [10.315  3.107  4.134]
 [ 9.82   4.474  3.684]
 [ 9.97   4.84   2.52 ]
 [ 9.227  5.23   4.611]
 [ 8.877  4.793  5.99 ]
 [ 7.909  5.865  6.409]
 [ 8.347  7.11   5.711]
 [ 8.904  6.624  4.372]
 [10.141  7.409  3.964]
 [10.042  8.406  3.251]
 [11.313  6.958  4.417]
 [12.575  7.48   3.934]
 [13.304  6.453  3.082]
 [12.777  5.374  2.818]
 [14.43   6.688  2.647]
 [ 1.276  2.211  0.   ]
 [ 8.934  2.211  0.   ]
 [11.487  2.211  0.   ]
 [-2.553  4.421  0.   ]
 [ 0.     4.421  0.   ]
 [ 2.553  4.421  0.   ]
 [ 7.658  4.421  0.   ]
 [10.211  4.421  0.   ]
 [12.763  4.421  0.   ]
 [-3.829  6.632  0.   ]
 [-1.276  6.632  0.   ]
 [ 1.276  6.632  0.   ]
 [ 3.829  6.632  0.   ]
 [ 6.382  6.632  0.   ]
 [ 8.934  6.632  0.   ]
 [11.487  6.632  0.   ]
 [14.04   6.632  0.   ]
 [16.592  6.632  0.   ]
 [-2.553  8.843  0.   ]
 [ 0.     8.843  0.   ]
 [ 2.553  8.843  0.   ]
 [ 5.105  8.843  0.   ]
 [ 7.658  8.843  0.   ]
 [10.211  8.843  0.   ]
 [12.763  8.843  0.   ]
 [15.316  8.843  0.   ]
 [-1.276 11.053  0.   ]
 [ 1.276 11.053  0.   ]
 [ 3.829 11.053  0.   ]]
mgoonde commented 3 months ago

Hi,

i tried computing with your input structure, and i am not able to reproduce the error unfortunately. Can you post some more details about your setup:

However, from what you post, it seems there are some problems:

Neo-Teyrall commented 3 months ago

Thanks for your quick response.

The makefile used gfortran: (GNU Fortran (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0)

I used the master.

Yes, it seems that the copy went wrong.

Yes, it is two molecules on a surface. It is not from a periodic box; there are two states. The shift on the surface keeps only the atoms that interact with the molecules here. From these two states I gave you, no rotation is present, only translation. However, in other states of my dataset, there are states with rotation and translation compared to other states.

I am trying to compare the dimer organization on the surface. This comparison needs to take into account the interaction between the first and second molecules and their interaction with the surface.

If, in the comparison, I take into account only the atoms of one molecule of the system, I can observe that some states have the same conformation but not the same orientation to the surface. To differentiate these states, I need to include the surface in the comparison.

mgoonde commented 3 months ago

From these two states I gave you, no rotation is present, only translation.

As far as i see, the two molecules rotate and distort slightly, there does not appear to be much translation in them.

Note that IRA explicitly solves for rotation and translation of the all the atoms in the structure. If you are trying to keep the rotation fixed, this is not the right tool for you.

I am trying to compare the dimer organization on the surface.

Ok, I am not sure which dimers you are talking about, the nearest distances between the surface atoms are all identical. From structure i to j, there is one atom which appears, and two atoms which move, the other atoms are completely not moved.

This comparison needs to take into account the interaction between the first and second molecules and their interaction with the surface.

I hope you realize that IRA does not take into account any interaction between the atoms, and it takes the whole structure that you input as a single, rigid entity.

It seems you are trying to compare only the surface atoms by their absolute positions, and not the molecules themselves. Since there is no global translation or rotation in your box, you could look into the CShDA distance only, or other Linear-Assignment-Problem (LAP) solvers, which compute the permutation of atoms, and based on that you could compute the distance among the surface atoms from structure i to j.

Neo-Teyrall commented 2 months ago

Thanks for your time,

I will try to be clearer.

Here you can find three PDB files. The extension has been modified to a .txt file just to enable importation in GitHub, as GitHub does not recognize PDB files. These files represent three states of systems containing a surface and two molecules (two tripeptides). The three states are equivalent.

The first one, 000011-0.txt, is a reference state. 000011-0.txt

The second one, 000012-0.txt, is the reference state but with a permutation occurring between the two molecules. 000012-0.txt

The third one, 000013-0.txt, is a state that has been translated and rotated on the surface. 000013-0.txt

This is an example sample. Notice that the states in a true dataset can combine permutation and translation with intra-molecular perturbation.

Based on that, I want to use IRA to estimate the difference between the states. I hope that IRA will estimate the difference close to zero (because the states are objectively the same).

But I noticed that the surface is an issue. Aligning all the surface atomes seems useless. So, to use IRA, I kept only the atoms of the surface which interact with the atoms of the molecules. This resulted in the data I sent you in the first message (the data I sent is not from these states but from others).

However, my problem is that when I compare states that do not have the same number of atoms using the Python API, it crashes with the message I initially sent you.

if you can not reproduce my error, can i have the git version you used ? Or how can i send you more information?

Thanks again.

mgoonde commented 2 months ago

Thanks for this info. In general, IRA is not the best tool for using when you want to use the distance value as general measure of similarity. IRA gives the real global minimum of that value only when the distortions in the structure are relatively small (see also: IRA docs). But it is true that you case is slightly specific.

As far as i understand your problem, you want to keep the surface atoms "fixed", i mean to allow rotations of the molecule only around the axis normal to the surface, so the surface will not move by this rotation. This complicates the situation a bit, and i think only the call to ira.match() will not be sufficient. Let me know if i understand your problem correctly, i can elaborate more steps you could try to do.

For the error, i tried with gfortran v9.5.0 and i am still not able to reproduce it. Unfortunately i don't have access to older versions. Is there a specific reason you use v9.4.0? Can you update this to a newer version?

Neo-Teyrall commented 2 months ago

I will complete with some information.

The molecules are on the metallic surface, and the position of the molecules relative to the surface is highly important. More precisely, if you observe the surface, you will notice that it is periodic, and the atoms are arranged in a pattern. So if the molecules of states A (figure below) are translated or rotated by a specific value (period shift), they will have the exact same interaction with the surface (states B, Figure below).

On the other hand, if you translate a molecule by less than the period shift, the molecule will be between two periods. This means that the molecule will not have the same interaction (see C in the figure below).

molecules

Therefore, performing a match based only on the molecules is not accurate enough because the surface information is not taken into account. My idea is to perform a match of the molecules' atoms with some selected atoms of the surface (see the figure above).

If I understand correctly how IRA works, it will try to align the selected atoms. Thus, states A and B will perfectly match. But state C will not match with states A and B.

If I do not take the surface into account in this manner or another, I am afraid that states A, B, and C will be perfectly matched. Obviously, state C is different due to its surface interaction.

Thank you again, for your time.

For the fortran version, It's my default installed version. I will try with the 9.5 version.

mgoonde commented 2 months ago

I see. This additional info helped. If i understand correctly, you don't really need to solve the full matching problem, since you only wish to compare the structures. This could be done by using some descriptor of atomic structures, which is invariant over rotations and permutations, and not explicitly solving for those things. That would be possibly faster and more efficient than IRA, since you don't actually need to know the things IRA returns, other than a distance value. The fact IRA gives a value of distance is only because it is very simple to compute it once the full solution for rotation, translation, and permutation is known. However, a distance/similarity between two structures can be computed without computing all this (you can search around for some descriptors, they are particularly used in machine learning approaches, for python you can check the "dscribe" module: https://github.com/SINGROUP/dscribe).

Regarding gfortran, you can find some info about installing different versions here: https://fortran-lang.org/en/learn/os_setup/install_gfortran/