| Install | Docker | Tutorials | Features | Pipeline parameters | Docs |
An open-source, end-to-end software pipeline for data curation, model building, and molecular property prediction to advance in silico drug discovery.
Created by the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium
The ATOM Modeling PipeLine (AMPL) extends the functionality of DeepChem and supports an array of machine learning and molecular featurization tools to predict key potency, safety and pharmacokinetic-relevant parameters. AMPL has been benchmarked on a large collection of pharmaceutical datasets covering a wide range of parameters. This is a living software project with active development. Check back for continued updates. Feedback is welcomed and appreciated, and the project is open to contributions! An article describing the AMPL project was published in JCIM. The AMPL pipeline documentation is available here.
Check out our new tutorial series that walks through AMPL's end-to-end modeling pipeline to build a machine learning model! View them in our docs or as Jupyter notebooks in our repo.
AMPL 1.6 supports Python 3.9 CPU or CUDA-enabled machines using CUDA 11.8 on Linux. All other systems are experimental. For a quick install summary, see here. We do not support other CUDA versions because there are multiple ML package dependency conflicts that can occur. For more information you can look at DeepChem, TensorFlow, PyTorch, DGL.
For installation on Apple Silicon M Chips, please see the Docker container instructions.
Make sure to create your virtual env in a convenient directory that has at least 12Gb space.
Go to the directory where the new environment directory be installed in. Define an environment variable - "ENVROOT".
export ENVROOT=~/workspace # for LLNL LC users, use your workspace
or
export ENVROOT=~ # or the directory as your environment root
We use "workspace" and "atomsci-env" as an example here.
# LLNL only:
# module load python/3.9.12
cd $ENVROOT
python3.9 -m venv atomsci-env
source $ENVROOT/atomsci-env/bin/activate
pip install pip --upgrade
git clone https://github.com/ATOMScience-org/AMPL.git
Depending on system performance, creating the environment can take some time.
Note: Based on which environment (CPU or CUDA) to run on, only run one of the following:
CPU-only installation:
cd AMPL/pip
pip install -r cpu_requirements.txt
CUDA installation:
First load the CUDA module. Then run cuda specific package install.
cd AMPL/pip
# LLNL only:
# module load cuda/11.8
pip install -r cuda_requirements.txt
If you get out of memory
errors, try setting these environment variables:
export LD_LIBRARY_PATH=<your_env>/lib:$LD_LIBRARY_PATH
export PYTHONUSERBASE=<your_env>
export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=48
export PYTORCH_HIP_ALLOC_CONF=gargage_collection_threshold:0.9,max_split_size_mb:128
export TF_FORCE_GPU_ALLOW_GROWTH=true
# LLNL only: required for ATOM model_tracker
pip install -r clients_requirements.txt
Run the following to build the "atomsci" modules. This is required.
# return to AMPL parent directory
cd ..
./build.sh
pip install -e .
export ENVROOT=~/workspace # set ENVROOT example
# LLNL only:
# module load python/3.9.12
python3.9 -m venv atomsci-env # create environment with Python 3.9
source $ENVROOT/atomsci-env/bin/activate
pip install pip --upgrade
git clone https://github.com/ATOMScience-org/AMPL.git # clone AMPL
cd AMPL/pip
# LLNL only:
# If use CUDA:
# module load cuda/11.8
pip install -r cpu_requirements.txt # install cpu_requirements.txt OR cuda_requirements.txt
# LLNL only: required for ATOM model_tracker
# pip install -r clients_requirements.txt
cd ..
./build.sh
pip install -e .
To run AMPL from Jupyter Notebook. To setup a new kernel, first activate your environment and then run the following command:
python -m ipykernel install --user --name atomsci-env
To retrieve, run version 1.6.2 or earlier, please specify the desired version tag:
docker pull atomsci/atomsci-ampl:v1.6.2
docker run -it -p 8888:8888 -v </local_workspace_folder>:</directory_in_docker> atomsci/atomsci-ampl:v1.6.2
For AMPL versions 1.6.3 and later, we offer downloadable images for various platforms (CPU, GPU or Linux/ARM64). To run a Docker container, be sure to append bash
at the end of the command to open a bash session.
docker pull atomsci/atomsci-ampl:latest-<platform> # can be cpu, gpu, or arm (for arm64 chip)
docker run -it -p 8888:8888 -v </local_workspace_folder>:</directory_in_docker> atomsci/atomsci-ampl:latest-<platform> bash
#inside docker environment
jupyter-notebook --ip=0.0.0.0 --allow-root --port=8888 &
# -OR-
jupyter-lab --ip=0.0.0.0 --allow-root --port=8888 &
atomsci
package.For additional options related to building, running, and other Docker development tasks, please refer to Makefile.md.
To remove AMPL from a pip environment use:
pip uninstall atomsci-ampl
To remove an entire virtual environment named "atomsci-env":
rm -rf $ENVROOT/atomsci-env
To remove cached packages and clear space:
pip cache purge
Details of running specific features are within the parameter (options) documentation. More detailed documentation is in the library documentation.
AMPL can be run from the command line or by importing into Python scripts and Jupyter notebooks.
AMPL can be used to fit and predict molecular activities and properties by importing the appropriate modules. See the examples for more descriptions on how to fit and make predictions using AMPL.
AMPL includes many parameters to run various model fitting and prediction tasks.
AMPL includes detailed docstrings and comments to explain the modules. Full HTML documentation of the Python library is available with the package at https://ampl.readthedocs.io/en/latest/.
AMPL includes a suite of software tests. This section explains how to run a very simple test that is fast to run. The Python test fits a random forest model using Mordred descriptors on a set of compounds from Delaney, et al with solubility data. A molecular scaffold-based split is used to create the training and test sets. In addition, an external holdout set is used to demonstrate how to make predictions on new compounds.
To run the Delaney Python script that curates a dataset, fits a model, and makes predictions, run the following commands:
source $ENVROOT/atomsci-env/bin/activate # activate your pip environment.
cd atomsci/ddm/test/integrative/delaney_RF
pytest
Note: This test generally takes a few minutes on a modern system
The important files for this test are listed below:
test_delany_RF.py
: This script loads and curates the dataset, generates a model pipeline object, and fits a model. The model is reloaded from the filesystem and then used to predict solubilities for a new dataset.config_delaney_fit_RF.json
: Basic parameter file for fittingconfig_delaney_predict_RF.json
: Basic parameter file for predictingAMPL can fit models from the command line with:
python model_pipeline.py --config_file filename.json # [filename].json is the name of the config file
To get more info on an AMPL config file, please refer to:
To run the full set of tests, use Pytest from the test directory:
source $ENVROOT/atomsci-env/bin/activate # activate your pip environment. "atomsci" is an example here.
cd atomsci/ddm/test
pytest
Please follow link, "atomsci/ddm/examples/tutorials", to access a collection of AMPL tutorial notebooks. The tutorial notebooks give an exhaustive coverage of AMPL features. The AMPL team has prepared the tutorials to help beginners understand the basics to advanced AMPL features, and a reference for advanced AMPL users.
Using "pip install -e ." will create a namespace package in your environment directory that points back to your git working directory, so every time you reimport a module you'll be in sync with your working code. Since site-packages is already in your sys.path, you won't have to fuss with PYTHONPATH or setting sys.path in your notebooks.
It's recommended to use a development branch to do the work. After each release, there will be a branch opened for development.
The policy is
Note: Step 2 is required for pushing directly to "master". For a development branch, this step is recommended but not required.
The "Google docstring" format is used in the AMPL code. When writing new code, please use the same Docstring style. Refer here and here for examples.
Versions are managed through GitHub tags on this repository.
The Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium
1. Lawrence Livermore National Laboratory\ 2. GlaxoSmithKline Inc.\ 3. Frederick National Laboratory for Cancer Research\ 4. Computable\ 5. University of California, San Francisco\ 6. Schrodinger\ 7. Leidos
Thank you for contributing to AMPL!
AMPL is distributed under the terms of the MIT license. All new contributions must be made under this license.
See MIT license and NOTICE for more details.