Info about benchmark runs (CXCR4 and CSF1R)

bill-tatsis commented 1 month ago

Hi, would it be possible to share input files and details used in the benchmark runs? Thanks :)

antoszewski commented 1 month ago

Here is a the code snippet used to do the runs:

for folder in dud_folders:
    print("{}ligands starting...".format(folder))
    get_similarity_scores(
        ref_file="query.sdf",
        dataset_files_pattern="ligands.smi",
        ignore_hs=True,
        n_confs=60,
        keep_mol=True,
        random_seed=109838974,
        opt_confs=True,
        calc_energy=True,
        energy_iters=300,
        energy_cutoff=30,
        align_confs=True,
        rms_cutoff=0.1,
        num_threads=46,
        method="ETKDGv3",
        volume_type="analytic",
        n=2,
        epsilon=0.5,
        use_carbon_radii=True,
        color=True,
        max_conformers=1,
        sort_by="ComboTanimoto",
        write_to_file=True,
        gpu_id=0,
        working_dir="{}/ligands".format(folder),
        #smiles_kwargs={"delimiter": "\t"},
    )

A few notes:

We downloaded the smiles files (ligands.smi and decoys.smi for each target) from the Charged_Matched_DUDE folder from https://dudez.docking.org/
The above loop is just for the ligands, you would need to run it again and change the working directory/dataset file for the decoys
The last (commented out) option is important! Some of the .smi files are space-delimited, which the above command will work for. If the .smi file is tab-delimited, you will need to uncomment out the last option to tell the file parser that.
In case there are macrocyles in the underlying .smi file (like for TRY1 and XIAP), you will need to manually remove those
If you run out of memory on your GPU, you will need to manually batch the inputs, or decrease the number of conformers. We're working on making ROSHAMBO faster and more memory efficient now!

bill-tatsis commented 1 month ago

Thanks!I'll give it a spin 🐡

bill-tatsis commented 1 month ago

A quick follow-up question, is query.sdf a compound from the ligands.smi file?

antoszewski commented 1 month ago

Good point - for each target, it is the xtal-lig.pdb file (converted into an .sdf file) found in the DOCKING_GRIDS_AND_POSES folder of dudez.docking.org

molecularinformatics / roshambo

Info about benchmark runs (CXCR4 and CSF1R) #4