Closed FilomenoSanchez closed 11 months ago
This cif file from MMDB is different from cif files from the PDB, and not conforming to the spec, in many ways.
The particular problem here is that _atom_site.label_entity_id
is set to 1
for all atoms. In mmCIF files, each so-called entity is explicitly assigned one of the types: polymer, branched, non-polymer, water. Here, this is missing, but since all residues are explicitly assigned the same entity_id, it means that they are all of the same type – gemmi assumes that this type is polymer. I'd need to add a sanity check to detect when entity_id is a dummy value.
As a workaround, you can call:
structure.add_entity_types(overwrite=True)
which re-assigns entity types. Then removing ligands/waters should work. The overwrite
option was fixed recently, it might not work in 0.6.3.
Anyway, it'd be better to not use mmCIF files like this.
Thanks for the information. I agree that ideally it would be better to not use such mmCIF files but I have no other option at the moment. Regarding the suggested fix structure.add_entity_types(overwrite=True)
, is this in anyway similar to gemmi::setup_entities
from v0.6.2? We are currently using gemmi::setup_entities
with the newly created structures but the problem persists. Also the approach you suggested in pumble mmdb->gemmi->mmCIF->gemmi seems not to solve the issue. It might be that we need to wait until serialization format is available...
Hi @wojdyr, sorry to bother you with this again. I've got another file (also originating from mmdb) where the _atom_site.label_entity_id
is set to 1 for atoms in residues, 2 for waters and 3 for the ligand, link to the file here. In this case, gemmi::remove_ligands_and_waters
can remove the waters but it seems unable to remove the ligands? Any idea why?
add_entity_types(overwrite=True)
overwrites entity types that were assigned before.
setup_entities()
calls add_entity_types()
, but without the overwrite flag, which doesn't change the types that were assigned previously.
I changed now how mmcif is read: if entity types are not specified explicitly in the file, they are not assigned automatically inside read_structure()
. The PDB has conventions in its files that distinguish polymers from non-polymers and branched entities: label_seq_id is null for all residues except polymers, and each non-polymer residue has separate label_asym_id. But other programs may not observe these conventions, so let's not rely on it.
Hi @wojdyr, thanks again for the package and taking the time to take care of it. I'm running into issues when i use
gemmi::remove_ligands_and_waters
both in C++ and python, using version 0.6.3. When I read in the cif file I'm sending attached (more info about how I got this file at the end), I seem to be unable to remove ligands and waters. For context, this structure has two chains, chain A has 348 waters and chain B only consists of 2 ligands. Here's the code:If I inspect the residues in chain A, I can see water residues have correctly set the flag
is_water
toTrue
. For instance:Also,
gemmi::remove_waters
seems to work fine (although of course it does not remove ligands, which is what I want):You can find the file here: 5a3h.zip. Regarding the origin of this file, it was created by writing a CIF file from a
mmdb::Manager
, here's the code:This issue is reproducible with all files generated in this way. I'm not sure whether this is a gemmi issue or if there's something wrong with the files produced with the above code, but visually inspecting the contents of the file I can't see anything wrong with them. Also the fact that residues that return
True
when runningis_water()
are kept even after runninggemmi::remove_ligands_and_waters
might indicate that there's something wrong with this function.