wwpdb-dictionaries / mmcif_pdbx

wwPDB PDBx/mmCIF Dictionary
Creative Commons Zero v1.0 Universal
9 stars 9 forks source link

Space group missing in _symmetry.space_group_name_H-M #58

Open drlemmus opened 4 months ago

drlemmus commented 4 months ago

The controlled vocabulary for _symmetry.space_group_name_H-M is missing the value 'C 1 2/c 1' which is the space group for PDB entry 5jzq. Could you add it?

CV-GPhL commented 4 months ago

Could we maybe take that opportunity for some further cleanup of that enumerated list? I was thinking of the following ...

Remove some odd (?) SGs that are not even used within PDB archive (looking at derived_data/index/crystal.idx):

A 1
B 2 21 2
C 2
C 2(A 112)
C 21
I 21
P 2
P 21
P 21(C)

(other slightly odd ones but with actual PDB entries are C 4 21 2, F 4 2 2 and P 21 21 2 A.

Should this maybe be a complete list of H-M symbols for all SGs - instead of just non-chiral/proteni ones plus a selected subset of chiral ones added as-needed?

wojdyr commented 4 months ago

The only one that bothers me is C 4 21 2. It's used in two files with such a remark:

REMARK   3  SPACE GROUP C 4 21 2 (WHICH, MORE PROPERLY, SHOULD BE NAMED         
REMARK   3   C 4 2 21) IS A NON-STANDARD REPRESENTATION OF SPACE GROUP          
REMARK   3   P 4 21 2.  IN THIS CASE THE AXES OF THE UNIT CELL ARE              
REMARK   3   CONSIDERED TO BE LEFT-HANDED. 

But I'm not sure if C 4 2 21 would be a matching name for the symops.

Other names used in the PDB entries are

githubgphl commented 4 months ago

Will anyone ever nowadays start using "A 1" or "F 4 2 2" as a SG name? Or: should they? A lot of those were originally triggered by restrictions in old, non-general software (I remember a specific phasing program in my old lab that couldn't handle non-orthogonal SGs, resulting in the search for new crystallisation conditions that avoided those SGs).

Even if something like "F 4 2 2" is possible and used in 2 (!) PDB entries from one (1) publication in 1987 (maybe to show a specific relation to other structures ...), the standard setting is "I 4 2 2". I think the "_symmetry.space_group_name_H-M" enumeration should stick with standard settings plus truly common alternate settings that are used in more than a handful of old PDB entries. E.g. "A 1" is neither standard nor used in any PDB entry and should not be in that list I think.

drlemmus commented 4 months ago

We can only drop cases not in the PDB at all. Anything, in the PDB, however rare should be kept. That said, perhaps some weird settings can be reset if that gets rid of some weird cases.

githubgphl commented 3 months ago

We can only drop cases not in the PDB at all. Anything, in the PDB, however rare should be kept.

Agree.

My suggestions would be