mittinatten / freesasa

C-library for calculating Solvent Accessible Surface Areas
http://freesasa.github.io/
MIT License
105 stars 37 forks source link

Classifier for hydrogen atoms might be not working #17

Closed molsim closed 7 years ago

molsim commented 7 years ago

In the following code classifier = freesasa.Classifier("t.config") structure = freesasa.Structure("t.pdb",classifier, options={'hydrogen' : True,'hetatm' : True}) result = freesasa.calc(structure,freesasa.Parameters({'algorithm' : freesasa.LeeRichards,'n-slices' : 200,'n-threads':6})) with these files Archive.zip I'm getting FreeSASA: warning: atom ' DC H5'' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H5''' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H4'' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H3'' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H2'' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H2''' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H1'' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H41' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H42' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H5 ' unknown, guessing element is ' H', and radius 1.100 A FreeSASA: warning: atom ' DC H6 ' unknown, guessing element is ' H', and radius 1.100 A

So the classifier fails to assign radii provided in config file.

mittinatten commented 7 years ago

Hi, Thanks for the bug report. I never wrote a test for this functionality, so I must have broken it somewhere along the way. I'm pretty sure it used to work, so I should have a fix shortly.

Simon

mittinatten commented 7 years ago

I had a look and it turns out this wasn't an actual error: in the python bindings the structure is first initialized with the default classifier and then the radii are reassigned using the classifier provided by the user, if there is one. The first step produces the warnings you saw, but then correct radii where being assigned.

I have rewritten this to avoid the double assignment in the general case, and suppress the warnings when necessary. See the dev branch.

molsim commented 7 years ago

Thank you! Yes, I can confirm that the assignment actually was working because I was able to reproduce the Naccesss results for the hydrogen atom SASA in DNA along nucleosome (see plot). h-sasa_compar

There is however, some jitter upto 1 sqA in atomic SASA values between Naccess and Freesasa. Archive.zip I attach two file for comparison that were run with very high precision (the parameters are in the header). E.g. for H4' 1.422 vs 0.96 Do you think these small differences are normal manifestation of variations in algorithm implementation (Lee Richards in this case) or there still may be some fix for it?

mittinatten commented 7 years ago

I will have detailed look later today, but the quick answer is that I found that I got identical results to Naccess at lower resolutions when using the same radii and same resolution, but that there were differences when I went to higher resolutions (I think z < 0.01). Since the results from the two algorithms in Freesasa approach each other asymptotically the higher the resolution I assumed there was either some difference in the implementation that only shows at high resolution (but as far as I can tell they should be identical), or maybe a round-off/precision error in Naccess. My Fortran skills are pretty poor, so I never managed to figure this out by looking at the code.

One simple check would be for you to do the same analysis at a lower resolution. 200 slices, as in your example, would correspond to z=0.005 in Naccess, do you still get differences if you use 100 slices and z=0.01?

molsim commented 7 years ago

While I'm doing more testing - I found out that I used rounded values of atomic radii, that I provided to Naccess. Some radii in CHARMM ff may be as precise as e.g. 0.2245 A. I truncated them to e.g. 0.22, while used full precision for freesasa - this should be one reason for the discrepancies.

molsim commented 7 years ago

Here is the summary of my testing of freesasa vs Naccess: 1) Naccess does not handle atomic radii with precision of more than 0.01A, while Freesasa does, so any comparison has to be done with atomic radii rounded off to %.2f 2) The atomic SASA values calculated with Naccess at z=0.005 match closely values calculated by Freesasa with n=200. The differences are in the 0.01sqA range which to me is a formidable reproducibility. 3) Once in Naccess precision is increased z=0.0005, 0.00005 - I do not see convergence in Naccess calculation. While Freesasa results on the other hand indeed converge.

I attach the analysis files. Archive.zip

mittinatten commented 7 years ago
  1. Good to know! FreeSASA radii are read and stored with double precision.
  2. That's similar to what I found when I did my tests, great to have an independent confirmation.
  3. Seems like you see the same problems I saw with Naccess last year. In your files, when you go from z = 0.0005 to 0.00005 the difference in area is larger than the difference between 0.005 and 0.0005. I would expect the opposite. This is consistent with there being a small precision error for each slice which adds up when you have many slices.

So we can consider this issue closed then?

molsim commented 7 years ago

Definitely, and thank you for your effort!