Closed JoaoRodrigues closed 6 years ago
Hi Joao,
I agree that allowing the user to provide reference values in the config-file would be a useful addition, just never got around to implementing it. So your request is a perfect occasion have a look at it again.
The extended Ala-X-Ala conformations used to calculate maximum SASA can be found in scripts/rsa/
. The configurations were generated using Profasi (http://cbbp.thep.lu.se/activities/profasi/). The files classifier_*.c
were generated using scripts/config2c.pl
by using classifier configuration files and the AXA input. The reason I did it this way is I didn't have access to the exact conformations used for NACCESS, and I wanted to use the same configurations for OONS, ProtOr, and NACCESS radii. But, now that I had a closer look at the Ala-X-Ala-files, I see some are reasonable and some are quite bad. Something went a bit too fast here (embarassing that this slipped through). So thanks for noticing the discrepancies here, this needs to be fixed as quickly as possible.
So, to begin with, a quick fix in eb2e6d3. I generated tripeptides in Pymol and recalculated reference values from those. Probably it's better to have standard values for NACCESS, will think about what would be the best syntax for specifying that in the config file.
I was about to send a pull request with the original NACCESS values. I contacted the author a while ago (when I first tested FreeSASA vs NACCESS actually) and he shared with me the set of extended peptides. I can share these with you if you want to have a set.
That would be great, if we could add those to the repository it would be even better. (He seems to be positive to the project in general https://f1000research.com/articles/5-189/v1#referee-response-12524)
Will do, thanks!
As for point 1, that's great by the way!
I have merged #28, thanks!
For the few amino acids I sampled the totals are now close (but not identical). I'm not sure what resolution was used to for naccess, but I used n=1000 (corresponds to z=0.001) in the calculation. But, there are differences in the definitions of main-chain/side-chain and polar/apolar, so these values are still different. I use C+N+O+CA as main-chain, and define all carbons as apolar.
I haven't verified that the classifier for naccess gives the same classes for all atoms as in naccess. Perhaps we need more than three types of carbon.
Possibly we should allow the configuration files to define which atoms are backbone and which are not. At the moment this is a global function.
So, when I benchmarked the code a couple of years back, the results were exactly the same. Unless you changed something, they should be the same. NACCESS uses z=0.05. I can have a look at reproducing the numbers exactly later in the month, don't worry about it!
Thanks again!
Yes, I also had very exact benchmark results when I wrote the paper. I would want to use high resolution for the default configuration, but it makes sense to make NACCESS a special case.
If using z=0.05 doesn't help, I'll have a look at the polar/apolar definitions, perhaps they don't match the original completely.
Ok, I've investigated further. I used ARG.pdb as an example. I get very close to identical RSA files in the ABS columns using z=0.01 in NACCESS and n=100 in FreeSASA, or z=0.05 and n=20. If a calculation is the same as the reference all REL columns should be 100.0. That's what I get with FreeSASA at n=1000 (which is verified in the test-suite).
NACCESS z=0.01:
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar
REM ABS REL ABS REL ABS REL ABS REL ABS REL
RES ALA S 1 164.19 152.1 83.16 119.8 81.03 210.2 85.28 119.5 78.91 215.7
RES ARG S 2 237.97 99.7 200.35 99.6 37.61 100.3 77.06 99.0 160.91 100.0
RES ALA S 3 159.00 147.3 71.06 102.4 87.94 228.2 83.76 117.3 75.24 205.7
FreeSASA n=100:
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar
REM ABS REL ABS REL ABS REL ABS REL ABS REL
RES ALA S 1 164.19 152.2 68.20 106.7 95.99 218.5 85.28 119.8 78.91 215.0
RES ARG S 2 237.97 99.8 196.20 99.8 41.76 100.1 77.06 99.8 160.91 99.9
RES ALA S 3 159.00 147.4 62.41 97.6 96.59 219.8 83.76 117.7 75.24 205.0
NACCESS z=0.05:
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar
REM ABS REL ABS REL ABS REL ABS REL ABS REL
RES ALA S 1 163.75 151.7 82.84 119.3 80.91 209.9 84.99 119.1 78.76 215.3
RES ARG S 2 239.14 100.2 201.08 99.9 38.07 101.5 77.20 99.2 161.95 100.6
RES ALA S 3 158.42 146.8 70.35 101.4 88.07 228.5 83.07 116.4 75.35 206.0
FreeSASA n=20:
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar
REM ABS REL ABS REL ABS REL ABS REL ABS REL
RES ALA S 1 163.75 151.8 68.03 106.4 95.72 217.8 84.99 119.4 78.76 214.5
RES ARG S 2 239.14 100.3 197.29 100.3 41.85 100.3 77.20 100.0 161.95 100.5
RES ALA S 3 158.42 146.8 62.31 97.4 96.11 218.7 83.07 116.7 75.35 205.3
NACCESS is off by a few tenths of a percent in both cases, so either the reference was calculated using some other z, or there could be some round off error involved, or something has changed in NACCESS between when the reference values were calculated and the version I'm using (2.1.1).
I tried z=0.005 and z=0.001 and got similar results: REL values were off by 0.1-0.2 percent.
Although being identical to legacy has its benefits, so does internal consistency, so I'm not sure what's best here.
I quickly checked the naccess.config
file, I think you count CB
as Main-Chain. Could you double check that? Otherwise, it's perfect (except for rounding errors here and there).
Main-chain is defined statically by the function freesasa_atom_is_backbone()
https://github.com/mittinatten/freesasa/blob/master/src/classifier.c#L915
So it can't be defined in config-files at the moment. As you can ses from the code main-chain is C
, N
, O
, CA
and OXT
. I think NACCESS excludes CA
from backbone. Can't check right now, but that should be obvious if you run it for the Gly-tripeptide.
You are right, from the NACCESS README.
By default, alpha carbon atoms are considered to be part of the amino acid side chain so that glycines possess a relative side chain accessibility. To switch this off, use the -b option. WARNING: Please note that when using this option, the sidechain and mainchain %accessibilities for all residues will be wrong.
Interestingly, I also ran NACCESS on the tripeptides and I get slightly different relative accessibilities compared to the standard.data
file but I cannot figure out where the little discrepancy comes from. Nitpicking anyway, I'd just recalculate standard.data
based on your code and the original peptides, to be consistent.
Could possibly provide something similar to the -b option, although that would require some rewiring. Or, just make a note somewhere that FreeSASA with naccess-configuration corresponds to the naccess -b option, and that reference SASAs used to calculate REL were generated using the -b definition for main-chain/side-chain.
I checked all 20 reference PDBs and all five ABS columns are identical between FreeSASA and naccess with option -b. So the backbone definitions are under control, and the polar/apolar definitions are the same.
So aef643bbbaab12c2 already has reference values calculated from the new PDBs, and should be ready to use.
Closing this issue, the remaining tasks have been moved to #31. Will save them for a later release (2.0.3?).
Hi Simon,
Great to see you keep working on FreeSASA!
I was trying to calculate relative solvent accessibilities (per-residue) and found some trouble.
classifier_naccess.c
and for example, for Arginine, it shows:If I look at the reference values in NACCESS (which are what SASAs are normalized for) it shows:
What are the reference values then? In NACCESS, the author calculated them using a specific set of Ala-X-Ala peptides in extended conformation. I got different values using extended conformations generated with Pymol, so YMMV ... But if you provide NACCESS values, maybe sticking to those in the code would be better.
Cheers!