Running new predictions

tlhr commented 2 months ago

Hi!

How can I run new predictions for unseen data (with no experimental pK)? It seems the process_and_predict.py script requires a dataset with pK values, and everything is hardcoded to use existing example data. Is there a separate script that can just take an SDF and PDB and output an affinity?

Thanks!

isakvals commented 2 months ago

Hi Thomas,

Yes definitely, I think the quickest way to do that now is to add a “pK” column to your .csv file with all ones or all zeros and get the predictions that way (But in general I should fix this so thank you for pointing this out). But in general, you can input your own .csv file to process_and_predict.py like so:

python process_and_predict.py --dataset_csv=PATH_TO_YOUR_CSV_FILE --data_name=DATA_NAME_OF_YOUR_CHOOSING --trained_model_name=20231116-181233_model_GATv2Net_pdbbind_core

If you can, I would also recommend that you train AEV-PLIG on PDBbind + BindingNet, especially if you’re using the model on a congeneric series of ligands binding the same protein.

Best wishes, Ísak

tlhr commented 2 months ago

Thanks for the quick reply Ísak! I got it to work, there's also an assertion that crashes the script if the predicted and prior pKs don't match: https://github.com/isakvals/AEV-PLIG/blob/99848f1a037d8a79f40a657f31c09925a99298d9/process_and_predict.py#L494

isakvals commented 2 months ago

Have changed the code so it doesn't need a pK column and doesn't do the assertion step. Please let me know if it works for you.

tlhr commented 2 months ago

Thanks a lot, that does the trick!

isakvals / AEV-PLIG

Running new predictions #1