Closed gihanpanapitiya closed 7 months ago
Thanks for you interest.
Unfortunately I can't find the preprocessed train/valid/test files.
For the two dta datasets, we randomly split them into trian/valid/test sets, following the setting of GraphMVP. Below is from the GraphMVP's paper:
Table 5: Results for four molecular property prediction tasks (regression) and two DTA tasks (regression). We report the mean RMSE of 3 seeds with scaffold splitting for molecular property downstream tasks, and mean MSE for 3 seeds with random splitting on DTA tasks. For GraphMVP , we set M = 0.15 and C = 5. The best performance for each task is marked in bold. We omit the std here since they are very small and indistinguishable. For complete results, please check Appendix G.4.
We did not perform any preprocessing except the preprocessing.py
in GraphMVP. But we applied normalization to the labels in the tuning stage. (SimSGT/regression/tuning_dta.py/train_dta
, line 246
)
Thank you very much for the details! Just for clarification, did you use the same test.csv
as prepared in preprocess.py
in the GraphMVP
repository (https://github.com/chao1224/GraphMVP/blob/main/datasets/dti_datasets/davis/preprocess.py) ?
Thank you very much for the details! Just for clarification, did you use the same
test.csv
as prepared inpreprocess.py
in theGraphMVP
repository (https://github.com/chao1224/GraphMVP/blob/main/datasets/dti_datasets/davis/preprocess.py) ?
Yes. As shown in line 177~187
of regression/tuning_dta.py, we use the original train.csv
and test.csv
files processed by GraphMVP's preprocess.py
.
Hello,
Can you share more details about how you prepared DAVIS and KIBA datasets?
I downloaded these datasets from here https://github.com/chao1224/GraphMVP/tree/main/datasets. Then preprocessed as instructed there. I then combined the resulting
train.csv
andtest.csv
files to create the full dataset. Then I used scaffold splitting to split this full dataset to train, valid and test. For DAVIS I used the transformed affinities(-np.log10(y / 1e9)
to train the model. Is this the approach you used as well?It would be great if you could add your preprocessed
train
,valid
andtest
folds to the repository.