lamalab-org / MoleculeBind

MoleculeBind is a machine-learning framework for chemistry, where we target unifying various molecular representations into one common latent space (SELFIES, SMILES, Graph, Structures, Fingerprints, Molecular Spectra)
Other
15 stars 0 forks source link
# MolBind

:scroll: Installation guide

It is recommmended using mamba or conda for creating a virtual environment. For inference/embeddings the installation guide is given below:

conda create -n molbind python=3.12
pip install -e .[inference]

If you want to (re)train the models, your system needs to have CUDA dependencies, please use the environment.yaml file for the installation.

conda env create -f environment.yaml
conda activate molbind

:file_folder: Data availability

The simulated spectra data have been compiled from IBM's Multimodal Spectroscopic Dataset.

(WIP :building_construction:) Run molbind-get-datasets from the command line to download the data.

:clipboard: Environment file

Your environment file should look like this:

WANDB_PROJECT="<your-wandb-project-name>"
WANDB_ENTITY="<your-wandb-account-name>"
TOKENIZERS_PARALLELISM=False

After you have defined your system variables in .env, it is read into the script as following:

load_dotenv("path/to/.env")

:chart_with_downwards_trend: Train models

The experiment configs can be found at config For example, to run the train.py

python train.py 'experiment="train/ir_simulated"'

To run the metrics on these experiments:

python retrieval.py 'experiment="metrics/ir_simulated"'

đź’° Funding

This work was funded by the Carl-Zeiss Foundation. In addition, this work was partly funded by the SOL-AI project funded as part of the Helmholtz Foundation Model Initiative of the Helmholtz Association. Moreover, this work was supported by Helmholtz AI computing resources (HAICORE) of the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI.