project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
205 stars 42 forks source link

TER's effecting aligning #303

Closed rimmartin closed 4 months ago

rimmartin commented 4 months ago

Hi @wojdyr ,

Starting with https://github.com/project-gemmi/gemmi/blob/master/tests/1orc.pdb

1orc alignment chain size: 1
get_polymer().empty()? 0
0 0 A A match_count: 64 309
1orc 0 aligned: 
--MEQRITLKDYAMRFGQTKTAKDLGVYQSAINKAIHAGRKIFLTINADGSVYAEEVKDGEVKPFP-----
1orc match_string: 
  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      cigar:2I64M5I
1orc alignment completed 

Now remove the TER at https://github.com/project-gemmi/gemmi/blob/master/tests/1orc.pdb#L816 or generate output without such a TER

1orc alignment chain size: 1
get_polymer().empty()? 1
1orc alignment completed 

Maybe it is no longer a "polymer".

If we start with the original again and introduce a final TER(as pdbWriteOptions.ter_ignores_type=true does) even if ending in water, it also doesn't do the align and polymer is empty.

If we change all the waters chain ID to W then the A chain gets aligned regardless of TER's

Now for our round trip we can write with gemmi to make the TER's work on reading back in. The issue is when pdb's are made by some other tool then this problem arises.

rimmartin commented 4 months ago

cif works. Not sure what differences other tools will do with cif's

wojdyr commented 4 months ago

Calling setup_entities() after reading a file should fix this.

Relying on TER records is a known problem in gemmi, last time reported in #299

rimmartin commented 4 months ago

Ah, tried for the two cases and match 64 and get same alignment as the one above

rimmartin commented 4 months ago

Since there is another issue about TER's this one closes