Closed mengliu1998 closed 2 years ago
(1) The provided types files contain filtered poses that all have RMSD < 2A, so the number of lines in the train0 file is the number of protein-ligand pairs in the training set. (2) Sure. The columns are:
Hi @mattragoza, Thank you for the response and sorry for my late reply. This solves my question.
Hi,
Thank you for this interesting and insightful work. I would like to follow your experimental setting. I have the following questions about the data format.
(1) How many training protein-ligand pairs you have after you filter out any poses that have RMSD greater than 2A? Is it 486740, which is the number of lines in it2_tt_0_lowrmsd_mols_train0_fixed.types.
(2) Could you explain the meaning of each line in it2_tt_0_lowrmsd_mols_train0_fixed.types? For example, what do the fist three numbers mean in the following line?
1 5.119186 1.97462 1433B_HUMAN_1_240_pep_0/4gnt_A_rec.pdb 1433B_HUMAN_1_240_pep_0/4gnt_A_rec_5f74_amp_lig_tt_min_0.sdf.gz #-6.28497
Thank you in advance.