project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
205 stars 42 forks source link

Bug in `get_all_unit_cell_sites()` #308

Closed fxcoudert closed 2 months ago

fxcoudert commented 2 months ago

Either this is a bug, or I'm completely misunderstanding what this function is supposed to do:

$ cat test.cpp
// C++ headers
#include <iostream>

// Some header-only libraries
#include <gemmi/cif.hpp>
#include <gemmi/smcif.hpp>

int main(void)
{
  auto block = gemmi::cif::read_file("FAU.cif").sole_block();
  auto st = gemmi::make_small_structure_from_block(block);

  std::cout << "spacegroup: " << st.spacegroup_hm << std::endl;
  std::cout << "nsites: " << st.sites.size() << std::endl;

  auto allsites = st.get_all_unit_cell_sites();
  std::cout << "natoms: " << allsites.size() << std::endl;
}
$ clang++ test.cpp -I gemmi/include -std=c++11 && ./a.out
spacegroup: F d 3 m
nsites: 5
natoms: 5

The first 5 is the number of sites alright, but after application of symmetry operations, there should be 576 sites/unit cell. Right now, it seems that the function is doing nothing.

This is the CIF file I used, but I see it on many others:

$ cat FAU.cif                                                     
data_FAU

#**************************************************************************
#
# CIF taken from the IZA-SC Database of Zeolite Structures
# Ch. Baerlocher and L.B. McCusker
# Database of Zeolite Structures: http://www.iza-structure.org/databases/ 
#
# The atom coordinates and the cell parameters were optimized with DLS76
# assuming a pure SiO2 composition.
#
#**************************************************************************

_cell_length_a                  24.3450(0)
_cell_length_b                  24.3450(0)
_cell_length_c                  24.3450(0)
_cell_angle_alpha               90.0000(0)
_cell_angle_beta                90.0000(0)
_cell_angle_gamma               90.0000(0)

_symmetry_space_group_name_H-M     'F d 3 m'
_symmetry_Int_Tables_number         227
_space_group.IT_coordinate_system_code  '2'
_symmetry_cell_setting             cubic

loop_
_symmetry_equiv_pos_as_xyz
'+x,+y,+z'
'+x,1/2+y,1/2+z'
'1/2+x,1/2+y,+z'
'1/2+x,+y,1/2+z'
'+z,+x,+y'
'+z,1/2+x,1/2+y'
'1/2+z,1/2+x,+y'
'1/2+z,+x,1/2+y'
'+y,+z,+x'
'+y,1/2+z,1/2+x'
'1/2+y,1/2+z,+x'
'1/2+y,+z,1/2+x'
'-x,1/4+y,1/4+z'
'-x,3/4+y,3/4+z'
'1/2-x,3/4+y,1/4+z'
'1/2-x,1/4+y,3/4+z'
'-z,1/4+x,1/4+y'
'-z,3/4+x,3/4+y'
'1/2-z,3/4+x,1/4+y'
'1/2-z,1/4+x,3/4+y'
'-y,1/4+z,1/4+x'
'-y,3/4+z,3/4+x'
'1/2-y,3/4+z,1/4+x'
'1/2-y,1/4+z,3/4+x'
'1/4+x,-y,1/4+z'
'1/4+x,1/2-y,3/4+z'
'3/4+x,1/2-y,1/4+z'
'3/4+x,-y,3/4+z'
'1/4+z,-x,1/4+y'
'1/4+z,1/2-x,3/4+y'
'3/4+z,1/2-x,1/4+y'
'3/4+z,-x,3/4+y'
'1/4+y,-z,1/4+x'
'1/4+y,1/2-z,3/4+x'
'3/4+y,1/2-z,1/4+x'
'3/4+y,-z,3/4+x'
'1/4-x,3/4-y,1/2+z'
'1/4-x,1/4-y,+z'
'3/4-x,1/4-y,1/2+z'
'3/4-x,3/4-y,+z'
'1/4-z,3/4-x,1/2+y'
'1/4-z,1/4-x,+y'
'3/4-z,1/4-x,1/2+y'
'3/4-z,3/4-x,+y'
'1/4-y,3/4-z,1/2+x'
'1/4-y,1/4-z,+x'
'3/4-y,1/4-z,1/2+x'
'3/4-y,3/4-z,+x'
'+y,+x,+z'
'+y,1/2+x,1/2+z'
'1/2+y,1/2+x,+z'
'1/2+y,+x,1/2+z'
'+x,+z,+y'
'+x,1/2+z,1/2+y'
'1/2+x,1/2+z,+y'
'1/2+x,+z,1/2+y'
'+z,+y,+x'
'+z,1/2+y,1/2+x'
'1/2+z,1/2+y,+x'
'1/2+z,+y,1/2+x'
'1/4+y,-x,1/4+z'
'1/4+y,1/2-x,3/4+z'
'3/4+y,1/2-x,1/4+z'
'3/4+y,-x,3/4+z'
'1/4+x,-z,1/4+y'
'1/4+x,1/2-z,3/4+y'
'3/4+x,1/2-z,1/4+y'
'3/4+x,-z,3/4+y'
'1/4+z,-y,1/4+x'
'1/4+z,1/2-y,3/4+x'
'3/4+z,1/2-y,1/4+x'
'3/4+z,-y,3/4+x'
'-y,1/4+x,1/4+z'
'-y,3/4+x,3/4+z'
'1/2-y,3/4+x,1/4+z'
'1/2-y,1/4+x,3/4+z'
'-x,1/4+z,1/4+y'
'-x,3/4+z,3/4+y'
'1/2-x,3/4+z,1/4+y'
'1/2-x,1/4+z,3/4+y'
'-z,1/4+y,1/4+x'
'-z,3/4+y,3/4+x'
'1/2-z,3/4+y,1/4+x'
'1/2-z,1/4+y,3/4+x'
'3/4-y,1/4-x,1/2+z'
'3/4-y,3/4-x,+z'
'1/4-y,3/4-x,1/2+z'
'1/4-y,1/4-x,+z'
'3/4-x,1/4-z,1/2+y'
'3/4-x,3/4-z,+y'
'1/4-x,3/4-z,1/2+y'
'1/4-x,1/4-z,+y'
'3/4-z,1/4-y,1/2+x'
'3/4-z,3/4-y,+x'
'1/4-z,3/4-y,1/2+x'
'1/4-z,1/4-y,+x'
'-x,-y,-z'
'-x,1/2-y,1/2-z'
'1/2-x,1/2-y,-z'
'1/2-x,-y,1/2-z'
'-z,-x,-y'
'-z,1/2-x,1/2-y'
'1/2-z,1/2-x,-y'
'1/2-z,-x,1/2-y'
'-y,-z,-x'
'-y,1/2-z,1/2-x'
'1/2-y,1/2-z,-x'
'1/2-y,-z,1/2-x'
'+x,3/4-y,3/4-z'
'+x,1/4-y,1/4-z'
'1/2+x,1/4-y,3/4-z'
'1/2+x,3/4-y,1/4-z'
'+z,3/4-x,3/4-y'
'+z,1/4-x,1/4-y'
'1/2+z,1/4-x,3/4-y'
'1/2+z,3/4-x,1/4-y'
'+y,3/4-z,3/4-x'
'+y,1/4-z,1/4-x'
'1/2+y,1/4-z,3/4-x'
'1/2+y,3/4-z,1/4-x'
'3/4-x,+y,3/4-z'
'3/4-x,1/2+y,1/4-z'
'1/4-x,1/2+y,3/4-z'
'1/4-x,+y,1/4-z'
'3/4-z,+x,3/4-y'
'3/4-z,1/2+x,1/4-y'
'1/4-z,1/2+x,3/4-y'
'1/4-z,+x,1/4-y'
'3/4-y,+z,3/4-x'
'3/4-y,1/2+z,1/4-x'
'1/4-y,1/2+z,3/4-x'
'1/4-y,+z,1/4-x'
'3/4+x,1/4+y,1/2-z'
'3/4+x,3/4+y,-z'
'1/4+x,3/4+y,1/2-z'
'1/4+x,1/4+y,-z'
'3/4+z,1/4+x,1/2-y'
'3/4+z,3/4+x,-y'
'1/4+z,3/4+x,1/2-y'
'1/4+z,1/4+x,-y'
'3/4+y,1/4+z,1/2-x'
'3/4+y,3/4+z,-x'
'1/4+y,3/4+z,1/2-x'
'1/4+y,1/4+z,-x'
'-y,-x,-z'
'-y,1/2-x,1/2-z'
'1/2-y,1/2-x,-z'
'1/2-y,-x,1/2-z'
'-x,-z,-y'
'-x,1/2-z,1/2-y'
'1/2-x,1/2-z,-y'
'1/2-x,-z,1/2-y'
'-z,-y,-x'
'-z,1/2-y,1/2-x'
'1/2-z,1/2-y,-x'
'1/2-z,-y,1/2-x'
'3/4-y,+x,3/4-z'
'3/4-y,1/2+x,1/4-z'
'1/4-y,1/2+x,3/4-z'
'1/4-y,+x,1/4-z'
'3/4-x,+z,3/4-y'
'3/4-x,1/2+z,1/4-y'
'1/4-x,1/2+z,3/4-y'
'1/4-x,+z,1/4-y'
'3/4-z,+y,3/4-x'
'3/4-z,1/2+y,1/4-x'
'1/4-z,1/2+y,3/4-x'
'1/4-z,+y,1/4-x'
'+y,3/4-x,3/4-z'
'+y,1/4-x,1/4-z'
'1/2+y,1/4-x,3/4-z'
'1/2+y,3/4-x,1/4-z'
'+x,3/4-z,3/4-y'
'+x,1/4-z,1/4-y'
'1/2+x,1/4-z,3/4-y'
'1/2+x,3/4-z,1/4-y'
'+z,3/4-y,3/4-x'
'+z,1/4-y,1/4-x'
'1/2+z,1/4-y,3/4-x'
'1/2+z,3/4-y,1/4-x'
'1/4+y,3/4+x,1/2-z'
'1/4+y,1/4+x,-z'
'3/4+y,1/4+x,1/2-z'
'3/4+y,3/4+x,-z'
'1/4+x,3/4+z,1/2-y'
'1/4+x,1/4+z,-y'
'3/4+x,1/4+z,1/2-y'
'3/4+x,3/4+z,-y'
'1/4+z,3/4+y,1/2-x'
'1/4+z,1/4+y,-x'
'3/4+z,1/4+y,1/2-x'
'3/4+z,3/4+y,-x'

loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
    O1    O     0.8958    0.1042    0.0000
    O2    O     0.9669    0.0762    0.0762
    O3    O     0.9966    0.1429    0.9966
    O4    O     0.9283    0.1770    0.0730
    T1    Si    0.9469    0.1251    0.0364
wojdyr commented 2 months ago

The spacegroup name in the file is:

_symmetry_space_group_name_H-M     'F d 3 m'

and should be F d -3 m. So the space group is not recognized, and no symmetry operations are applied (_symmetry_equiv_pos_as_xyz is not used when determining spacegroup, only the name).

What would expected in such a case? Throwing an exceptions from get_all_unit_cell_sites()?

fxcoudert commented 2 months ago

Hum, I didn't see that. F d 3 m is the older name for F d -3 m, it is used and accepted by all crystallographic software I know: CrystalMaker, Mercury, Platon, etc. (see for example, for Platon, table 9.1 here: http://cad4.cpac.washington.edu/structures/software/WinGX/manuals/platon.pdf)

It is present in “Table 12.3.4.1. Standard space-group symbols” of the International Tables For Crystallography - Vol A - Space group symmetry (2002 edition).

Maybe it should be recognised as a synonym? Or the software should get the _symmetry_Int_Tables_number that is included in the CIF file, if it does not know the H-M symbol?

wojdyr commented 2 months ago

OK, I see that cctbx also recognizes it without -. I'll look into it.

I just added a way to use space group determined from symmetry operations instead of H-M name:

st.determine_spacegroup_from("x");  // before the spacegroup gets used

(x stands for xyz list). This might be more reliable than the name, because the names can be ambiguous. I could also add using a number as a fallback, but that's even more ambiguous. I'll think about it. If I add numbers, "n" would stand for number, "h" for H-M name.

wojdyr commented 2 months ago

I made two changes to accommodate more spacegroups from the PLATON manual. Now only two of them don't parse: "I2/N" and "I43M". It bothers me a bit, but these two symbols also don't parse in cctbx, which is used in various crystallographic programs, so I guess it's fine.

I'll probably also change how the space group is determined by default, but I don't know yet how it should work. In principle, we can use the list of operations, Hall symbol, H-M symbol and space group number. But I suppose not all these fields are reliable in all the files out there. Throwing error when, for example, the Hall symbol is wrong, could be annoying to some people. OTOH, silently ignoring fields that are wrong and just taking the space group from other fields would make it harder to track what's going on. I'll ask around what space group-related fields should be used, and how much consistency checking is expected when reading a file.

wojdyr commented 2 months ago

FTR, space group settings are now set up in a maximally configurable manner: https://gemmi.readthedocs.io/en/latest/mol.html#smallstructure-spacegroup