Open KatarinaYuan opened 2 months ago
Please open an issue in mini3di. It is a community project, which we don't run.
Hi @KatarinaYuan (and hi @milot-mirdita, thanks for pointing this out in the e-mail).
This difference is actually due to some atoms in the linked PDB file being disordered atoms:
ATOM 33 CB ACYS A 6 21.438 19.816 -0.079 0.50 12.85 C
ATOM 34 CB BCYS A 6 21.428 19.604 0.838 0.50 8.66 C
The way these are handled changes between Biopython and Foldseek:
mini3di
GemmiWrapper.cpp
)?This difference in behaviour cause different atom coordinates to be selected so in the end the 3di sequences are diffferent. I can add a flag to mini3di
to take the last atom regardless of occupancy but my impression is that it's the better choice over taking the last atom in the order it appears in the source file?
Expected Behavior
I try to transform PDB structures into 3Di sequences. For mini3di (https://github.com/althonos/mini3di/), I used
For FoldSeek, I used the command suggested by this issue #314
Current Behavior
mini3di results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVTVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHGDD"
and FoldSeek results in "DKKKWWKDFPDPKTKIKIWDDDDLFKIKIWMKIFQADFDKKWKWWACAQDCPVHVVVSHFGAAPPDFWDFAQPDPRHGLTGDFIFGDDPRMTTDMDIHNSAGCDDPNRQQRIKMFIANAGQCGLPPPDPVSRGTSPRDDTRIMTGMHDDD"
and the two resulted sequences are not identical in some residues.
Environment
I used foldseek==9-427df8a (the latest) and mini3di==0.1.1.
Thanks for help