t7morgen / misato-dataset

GNU Lesser General Public License v2.1
180 stars 16 forks source link

On protein and ligand coordinates after molecular dynamics simulations #18

Open zhaolongNCU opened 3 weeks ago

zhaolongNCU commented 3 weeks ago

Thank you so much for your rich and meaningful work! In actually learning about this dataset, I came across some questions I would like to ask you, as follows: Is it possible to give the initial structure coordinates of the initial protein and ligand complexes before MD. Because we found that the coordinates of the protein and ligand trajectories in the MD and the coordinates downloaded from the actual PDB are very different. Is the MD done as a .pdb file downloaded from PDB or given by PDBbind? If not is it possible to give the initial coordinate file for performing the MD simulation or where can I get it? As an example we looked at the first frame of the complex 1A0Q and found the coordinates as follows:

test['1A0Q']['trajectory_coordinates'][0] array([[45.46517563, 21.3677578 , 54.84270477], [46.16779709, 21.72884941, 55.47201538], [45.14961624, 20.52666664, 55.30431747], ..., [29.64250374, 26.62957573, 46.50150299], [29.50019073, 27.99045372, 47.59023666], [31.38412285, 28.50348663, 46.02301407]]) However, the actual .pdb file of 1A0Q downloaded from PDB is very different from its coordinates, what is the reason for this?
ATOM 1 N ILE L 2 27.234 12.955 59.573 1.00 19.10 N ATOM 2 HN1 ILE L 2 27.296 12.868 60.608 1.00 0.00 H ATOM 3 HN2 ILE L 2 28.165 12.761 59.152 1.00 0.00 H ATOM 4 HN3 ILE L 2 26.933 13.918 59.322 1.00 0.00 H ATOM 5 CA ILE L 2 26.259 11.993 59.062 1.00 24.58 C ATOM 6 HA ILE L 2 26.699 11.034 59.337 1.00 0.00 H ATOM 7 C ILE L 2 26.060 12.005 57.544 1.00 25.90 C ATOM 8 O ILE L 2 25.651 12.995 56.933 1.00 21.74 O ATOM 9 CB ILE L 2 24.841 12.193 59.715 1.00 19.71 C ATOM 10 HB ILE L 2 24.474 13.179 59.430 1.00 0.00 H ATOM 11 CG1 ILE L 2 24.902 12.121 61.236 1.00 25.17 C ATOM 12 1HG1 ILE L 2 25.432 11.208 61.506 1.00 0.00 H ATOM 13 2HG1 ILE L 2 25.466 12.984 61.589 1.00 0.00 H ATOM 14 CG2 ILE L 2 23.911 11.073 59.220 1.00 17.80 C ATOM 15 1HG2 ILE L 2 23.831 11.123 58.134 1.00 0.00 H ATOM 16 2HG2 ILE L 2 24.322 10.106 59.511 1.00 0.00 H Looking forward to your reply!

t7morgen commented 3 weeks ago

Thank you very much for your questions. As a starting point we took the PDB files from PDBbind (should also be the same as the ones you find in the PDB). During parametrization and addition of water molecules the structures get shifted in space, so that you will not find an exact match of these coordinates. I would suggest to 3D align the coordinates before comparison. This task can be performed using 3D molecular viewers like pymol or VMD or using the align_frame_to_ref function from here: https://github.com/t7morgen/misato-dataset/blob/master/src/data/processing/preprocessing_db.py

zhaolongNCU commented 3 weeks ago

Ok thank you very much for your patient answer, I probably understand it. I would like to ask if you can provide the protein coordinates.pdb (initialization reference file) after parameterization and water addition! Looking forward to your more excellent work!

t7morgen commented 3 weeks ago

In principle that would be possible, but it will not drastically change from the first frame in the current dataset. What would you need it for?

zhaolongNCU commented 2 weeks ago

I would like to use him as a control. Because in principle we should use the initial complex structure (not simulated by MD) as a control for the next analysis. But I see that your code shows the first frame as a control. As stated in the MISATO paper, the first frame is from the 2ns, and I personally feel that this is not the initial structure, and such a control may have some bias. Thank you very much for your reply, is it possible to provide the initial complex structure after initialization with water and other operations.