mittinatten / freesasa

C-library for calculating Solvent Accessible Surface Areas
http://freesasa.github.io/
MIT License
105 stars 37 forks source link

Selection syntax issues #16

Closed molsim closed 7 years ago

molsim commented 7 years ago

I'd like to be able to calculate the SASA of individual hydrogen atoms in protein DNA complexes (e.g. nucleosome PDB 1KX5).

Looks like there are several problems with parsing selection syntax in the function below:

freesasa.selectArea(['1T, (chain I) and (resi -72) and (resn DC) and (name H5\'\')'],structure, result)

1) selection names cannot start with a number "1T" vs "T1". 2) negative resi are not supported "-72" vs "72". 3) hydrogen atom names with primes are not recognized H5'' is treated as H5' or H5.

molsim commented 7 years ago

Made a quick fix to issues 2 and 3 by changing lexer.l rules via molsim@cb27bab9bd84fcb7bd7a899cde15f0edc1008d1b

mittinatten commented 7 years ago

Yes that's the right place to change it. 1. and 3. should be fairly straightforward to fix and your suggestion for 3 seems correct. For 2 one has to consider that we use minus signs for ranges too i.e. resi 1-3. If we allow negative numbers the syntax resi 1-3+5-7 probably won't work anymore, but I think allowing negative residue numbers does make sense, so I'll look into modifying the rules here.

Thanks for pointing this out!

molsim commented 7 years ago

Regarding 2, PDB actually uses negative resi (eg. PDB 1KX5) Pymol or other programs usually handle this via escaping the minus sign, e.g. "chain i and resi \-10-\-5" - this will select resi between -10 and -5.

mittinatten commented 7 years ago

The last commit to the dev branch adds this functionality. It passes all the tests I could come up with at least, if it works for you too I'll merge it to master eventually.

molsim commented 7 years ago

Thank you! I'd also suggest making selection ids (keys) even more flexible, currently they are restricted to alphanum and will not work if contain only digits. Using "-72" as selection id is not parsable now. Here is an adhoc fix I use now to allow for that molsim@ eda742451110ded7b0119132b55eaa10c75c3a7f

molsim commented 7 years ago

PS just realized that github syntax in one of my comments got confused with escaping the minus sign too. Pymol selections syntax to my knowledge behaves as follows: "resi \-10" selects residue -10 "resi -10" will select all residues with resi <= -10.

mittinatten commented 7 years ago
molsim commented 7 years ago

Thank you, Simon! Yes, I again got confused with the syntax, you are correct "resi -10" should select resi <= 10.

mittinatten commented 7 years ago

There are now less restrictions on selection names, now alphanumeric characters, '+', '-' and '_' are allowed in any order.

mittinatten commented 7 years ago

Pushed code that allows open ended ranges and where negative indices are escaped with backslash (see changes to the files changelog or doxy_main.md in the commit above for details). I added quite a few tests, so I am relatively confident this works as intended, but if you have the chance to do some sanity checks too, that would be great!

mittinatten commented 7 years ago

Will close this for now, let me know if you discover any further selection problems.

molsim commented 7 years ago

Thank you, selections work for fine for me now!

vmlynsky commented 7 years ago

Hi, I am really sorry for bothering you with stupid question, but ..

Could you please give a simple example, how to 'feed' your script via "--select" option? I think I am guite familiar with PyMOL, but I can not figure out, which COMMAND should be put there, e.g., when I want to select multiple atoms with indexes..

Thank you in advance. Best, VM.

mittinatten commented 7 years ago

Hi, There is no option at the moment to select atoms by index. A simple example to select residues by index would be

freesasa 1abc.pdb --select="selection_name, resi 1-4"
mittinatten commented 7 years ago

Full documentation of the subset of PyMOL commands available in FreeSASA can be found here http://freesasa.github.io/doxygen/Selection.html

vmlynsky commented 7 years ago

Thanks, .. and sorry, I somehow missed that page in your manual.

vmlynsky commented 7 years ago

.. ok, indexes are not supported, but one can use atom names instead. Some relabelling is needed though because our friend Gromacs likes to use atom names with numbers in front. :-)

mittinatten commented 7 years ago

It's probably possible to allow atom names to start with numbers in the selection. Do you have an example file I could use?

I will look into adding atom selection by index too, but that's a larger project.

emroberts95 commented 5 years ago

Is there a way to select area using cofactors, for example FMN?

I tried using resn because FMN is in the same column as the amino acids with this code: FMNselect = freesasa.selectArea(['FMNarea, resn FMN'], structure, result)

However, I keep getting this from the CMD line: FreeSASA: warning: Found no matches to resn 'FMN', typo?

mittinatten commented 5 years ago

Hi, that’s probably because it’s HETATMs. You can include those as a flag when you init the structure.

emroberts95 commented 5 years ago

Could you explain how to do to this? This is what I've tried so far: structure = freesasa.Structure(filename) addatom = freesasa.Structure.addAtom(residueName = "FMN")

But I'm getting this error from the command line: TypeError: descriptor 'addAtom' of 'freesasa.Structure' object needs an argument

mittinatten commented 5 years ago

Here’s a line from the test suite that tests hetam inclusion:

Structure("lib/tests/data/1ubq.pdb",None,{'hetatm' : True})

emroberts95 commented 5 years ago

This worked! Thank you so much for your help!