This package is a fork of the code of BenevolantAI https://github.com/BenevolentAI/guacamol_baselines. It enables running generations with several models, for the tasks from the guacamol benchmark , and for another task named "pi3kmtor" described in the paper.
Additionaly, the package enables to run generations with synthetic score as a new constraint. The synthetic accessibility score can be one of the following :
Run poetry install
to install all the required dependencies.
Download the guacamol SMILES files with the script found in guacamol_baselines/fetch_guacamol_dataset.sh
Extract the models.zip
file found in the folder synthetic_scorers/RAscore
Generated smiles are stored in a MongoDB database, so you need to setup a database and set the following environment variables to point to it:
MONGO_URL
: URI and necessary credentials to the database serverDB_STORAGE
: Name of the database to be used to store the sampled SMILES (the code will create collections to store the data)Finally, if you want to use the RScore score, calculated by the Spaya API, you need to have two more environment variables:
SPAYA_API_URL
SPAYA_API_TOKEN
These should contain your credentials to use the Spaya API.
The goal directed generations have 2 essentials arguments :
synth_score
variable (see below)RScore, SAscore, SCscore, RAscore, or RSPred
. This parameter is only used by the pi3kmtor
suite.Run 10 steps of generation around pi3kmtor dataset, optimizing 4 constraints and using the SA score
constraint:
poetry run python -m guacamol_baselines.smiles_lstm_hc.goal_directed_generation --suite pi3kmtor --n_epochs 10 --synth_score SAscore
The generated molecules are saved in the MongoDB database defined above, in collections named with the synth_score
used by each task (eg. benchmark_name+"_"+<synth_score>
)
For the pi3kmtor generations, the differents scores of the molecules are already in the collection.
You can use the notebook in exploit_results/exploit_results_pi3kmtor.ipynb
to analyse the results.