Closed FilomenoSanchez closed 7 months ago
This cif file from MMDB is different from cif files from the PDB, and not conforming to the spec, in many ways.
The particular problem here is that _atom_site.label_entity_id
is set to 1
for all atoms. In mmCIF files, each so-called entity is explicitly assigned one of the types: polymer, branched, non-polymer, water. Here, this is missing, but since all residues are explicitly assigned the same entity_id, it means that they are all of the same type – gemmi assumes that this type is polymer. I'd need to add a sanity check to detect when entity_id is a dummy value.
As a workaround, you can call:
structure.add_entity_types(overwrite=True)
which re-assigns entity types. Then removing ligands/waters should work. The overwrite
option was fixed recently, it might not work in 0.6.3.
Anyway, it'd be better to not use mmCIF files like this.
Thanks for the information. I agree that ideally it would be better to not use such mmCIF files but I have no other option at the moment. Regarding the suggested fix structure.add_entity_types(overwrite=True)
, is this in anyway similar to gemmi::setup_entities
from v0.6.2? We are currently using gemmi::setup_entities
with the newly created structures but the problem persists. Also the approach you suggested in pumble mmdb->gemmi->mmCIF->gemmi seems not to solve the issue. It might be that we need to wait until serialization format is available...
Hi @wojdyr, sorry to bother you with this again. I've got another file (also originating from mmdb) where the _atom_site.label_entity_id
is set to 1 for atoms in residues, 2 for waters and 3 for the ligand, link to the file here. In this case, gemmi::remove_ligands_and_waters
can remove the waters but it seems unable to remove the ligands? Any idea why?
add_entity_types(overwrite=True)
overwrites entity types that were assigned before.
setup_entities()
calls add_entity_types()
, but without the overwrite flag, which doesn't change the types that were assigned previously.
I changed now how mmcif is read: if entity types are not specified explicitly in the file, they are not assigned automatically inside read_structure()
. The PDB has conventions in its files that distinguish polymers from non-polymers and branched entities: label_seq_id is null for all residues except polymers, and each non-polymer residue has separate label_asym_id. But other programs may not observe these conventions, so let's not rely on it.
Hi @wojdyr, thanks again for the package and taking the time to take care of it. I'm running into issues when i use
gemmi::remove_ligands_and_waters
both in C++ and python, using version 0.6.3. When I read in the cif file I'm sending attached (more info about how I got this file at the end), I seem to be unable to remove ligands and waters. For context, this structure has two chains, chain A has 348 waters and chain B only consists of 2 ligands. Here's the code:If I inspect the residues in chain A, I can see water residues have correctly set the flag
is_water
toTrue
. For instance:Also,
gemmi::remove_waters
seems to work fine (although of course it does not remove ligands, which is what I want):You can find the file here: 5a3h.zip. Regarding the origin of this file, it was created by writing a CIF file from a
mmdb::Manager
, here's the code:This issue is reproducible with all files generated in this way. I'm not sure whether this is a gemmi issue or if there's something wrong with the files produced with the above code, but visually inspecting the contents of the file I can't see anything wrong with them. Also the fact that residues that return
True
when runningis_water()
are kept even after runninggemmi::remove_ligands_and_waters
might indicate that there's something wrong with this function.