project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
232 stars 46 forks source link

Issue creating gemmi selections #289

Closed FilomenoSanchez closed 11 months ago

FilomenoSanchez commented 11 months ago

Hi @wojdyr , thanks for the package, I find it really useful. I've recently noticed an issue while creating selections in structures where chains have been named with a "-" symbol. This tends to happen in CIF files for biological assemblies (for example entry 7BQX, link to biological assembly file here). So for instance, creating a selection as follows:

import gemmi
sele = gemmi.Selection('//e-2/19(LEU)')

Throws the error:

----> 1 sele = gemmi.Selection('//e-2/19(LEU)')

RuntimeError: Invalid selection syntax in a list near "-2/19(LE": //e-2/19(LEU)

For context, this is using gemmi version 0.6.2. I am not sure whether such chain names are allowed in the CIF format specifications, and since I've only seen this in files for biological assemblies I would understand if you don't consider this an issue and don't want to fix it. Otherwise it would be useful for us to have this working in gemmi. Thanks in advance!

wojdyr commented 11 months ago

Hi Filomeno I didn't see (or forgot) that chain IDs in the assemblies from RCSB contain -. I was checking if a list of chain IDs contains - and throwing an error if it does, because - in the CID selection can mean a range, for example residues 5-33. So this check would show an error if a user tried to specify chains as A-C. But since it can be a part of the chain ID, I removed this check now.

pschmidtke commented 11 months ago

Not sure I can also comment on that, I have the error also on numberical chains. I.e. 7egb chain 8 I'm doing a selection like : '8//' which yields still a runtime error

RuntimeError: Invalid selection syntax in a list near "/*": 8/*/*

wojdyr commented 11 months ago

Here it's the error message that is not clear. 8/*/* is parsed as residue 8 atoms */*. Then, processing */* as a list triggers the error.

The selection syntax allows omitting leading and trailing fields. It guesses what was omitted from the first value. For numeric chain name you need to have the leading slashes:

//8
pschmidtke commented 11 months ago

oh I missed that, sorry, that makes sense. Thanks for the quick reply