GROdecoder

GroDecoder extracts and identifies the molecular components of a structure file (PDB or GRO) issued from a molecular dynamics simulation.

Installation

Clone the project:

git clone https://github.com/pierrepo/grodecoder.git
cd grodecoder

Create and activate a conda environment:

conda env create -f environment.yml
conda activate grodecoder-env

Usage as command line tool

Run GROdecoder on a test file:

python grodecoder.py --input data/examples/barstar.gro

Add edges and degre in the fingerprint (by default at false)"

python grodecoder.py --input data/examples/barstar.gro --checkconnectivity

Choose the method to calculate the atom pairs. If we know the resolution of the system is coarse-grain enter a threshold (a positiv float number) or we don't know so choose 'auto' (by default at 'auto'):

python grodecoder.py --input data/examples/barstar.gro --bondthreshold [auto or a threshold]

Add PDB id, their putative name and the organism name in the JSON file for each protein sequence (by default at false):

python grodecoder.py --input data/examples/barstar.gro --querypdb

Run the web app

Run the Streamlit web app:

streamlit run streamlit_app.py

then open your web browser at http://localhost:8501

or with the URL:

https://grodecoder.streamlit.app/

Update the database

CSML Charmm-gui database

Run the script to download all-atom model molecule data:

python script/scrap_charmm_gui_CSML.py

This script analyze the CHARMM-GUI CSML database (https://www.charmm-gui.org/?doc=archive&lib=lipid). It scrap information and download the data into a CSV file, if it's not already exist. This database contains information of different molecular type in all atom model (amino acid, nucleic acid, carb, lipid, ...), but for now we only going to use the data about lipid. This CSV file (data/databases/lipid_CHARMM_GUI_CSML.csv) contains information for each molecule, like: the category of the molecule, their alias, their common name, a link to view their structure, a link to download the PDB of this file, the formula and the residue name from the PDB file. All this information help to identify lipids in the PDB or GRO file we want to analyze.

MAD database

Run the script to download coarse-grain model molecule data:

python script/scrap_MAD.py

This script analyze the MAD database (https://mad.ibcp.fr/explore). It scrap information and download the data into a CSV file, if it's not already exist. This database contains information of different molecular type in coarse grain model (amino acid, solvent, sugar, lipid, ...), but for now we only going to use the data about lipid. This CSV file (data/databases/lipid_MAD.csv) contains information for each molecule, like: their common name, their alias, the category of the molecule, a link to download the PDB of this file. All this information help to identify lipids in the PDB or GRO file we want to analyze.

pierrepo / grodecoder

readme