Besides the ligand analysis on exact matches in ChEMBL, add analysis on the highest similarity of each ligand to the ChEMBL dataset, so that we can check how similar the recombined ligands are to know molecules.
Todos
[x] Update script that prepares ChEMBL data: ligand_analysis/chembl_standardize_inchi.py
[x] Rename to ligand_analysis/cli_prepare_chembl.py
[x] Add this script to setup.py for CLI access
[x] Keep chembl_id column in ChEMBL DataFrame
[x] Update CLI script that loads all data and runs the comparison: ligand_analysis/cli.py
[x] Rename to ligand_analysis/cli_ligand_analysis.py?
[x] When loading ChEMBL data from file, add fingerprint column to ChEMBL DataFrame
[x] Update function that compares ligands to ChEMBL data: ligand_analysis/analyze/_analyze_ligand.py
Questions
[x] How to save a rdkit.DataStructs.cDataStructs.ExplicitBitVect object to CSV? > Not necessary, we won't store the fingerprints during ChEMBL data standardization (one time event using ligand_analysis/cli_prepare_chembl.py), but when we load the standardized data from file during the ligand analysis (ligand_analysis/cli.py).
Description
Besides the ligand analysis on exact matches in ChEMBL, add analysis on the highest similarity of each ligand to the ChEMBL dataset, so that we can check how similar the recombined ligands are to know molecules.
Todos
ligand_analysis/chembl_standardize_inchi.py
ligand_analysis/cli_prepare_chembl.py
setup.py
for CLI accesschembl_id
column in ChEMBL DataFrameligand_analysis/cli.py
ligand_analysis/cli_ligand_analysis.py
?fingerprint
column to ChEMBL DataFrameligand_analysis/analyze/_analyze_ligand.py
Questions
rdkit.DataStructs.cDataStructs.ExplicitBitVect
object to CSV? > Not necessary, we won't store the fingerprints during ChEMBL data standardization (one time event usingligand_analysis/cli_prepare_chembl.py
), but when we load the standardized data from file during the ligand analysis (ligand_analysis/cli.py
).Status