matthiaskoenig / brendapy

BRENDA parser in python
GNU Lesser General Public License v3.0
19 stars 6 forks source link

Complete mapping of substance names #17

Open matthiaskoenig opened 5 years ago

matthiaskoenig commented 5 years ago

In the flat file the substances are only provided via their names. To my knowledge no name to brenda ligand or chebi mapping file exists which could be used for resolving the information. It is necessary to convert substance identifiers to proper ontologies/annotations. I.e. things like D-glucose must be converted to the respective ChEBI.

At the moment substance information in BRENDA is mapped to chebi based on their substance names (using chebi substance and synonym information) using perfect matching.. This mapping is far from complete and many substances cannot be resolved. Probably some heuristic name matching is needed to completely solve this issue (if somebody knows about a mapping file containing BRENDA names please let me know).

See attached the substance names which cannot be mapped:

unmapped_substances.txt

DeepaMahm commented 4 years ago

@matthiaskoenig Hi, I found this link(http://mmtb.tu-bs.de/idparser) for parsing BRENDA ligand ID. For instance, I tried to obtain the mapping for a few compounds listed in unmapped_substances.txt.

The output is BRENDA ligand id. I am not sure how BRENDA ligand id can be mapped to other compound identifiers like KEGG id though. I've written to http://mmtb.tu-bs.de/idparser to find out how to map BRENDA id to KEGG id (or others), will post here if I get a response.

EDIT: http://mmtb.tu-bs.de/ search using a compound name provides the list of all known synonyms of the compound. The search result also maps the compound to a Brenda compound id. For instance, all known synonyms of L-alanine are returned after search

(S)-2-aminopropanoic_acid
(S)-alanine
2-Aminopropanoate
2-Aminopropionate
Ala
alanine
alanine/in
alanine/out
alpha-alanine
L-2-Aminopropionate
L-2-aminopropionic_acid
L-Ala
L-alanin
L-alanine
L-alanine/in
L-alanine/out
L-alpha-alanine
L-alpha-aminopropionic_acid

All the above synonyms are mapped to a Brenda compound id, https://www.brenda-enzymes.org/ligand.php?brenda_group_id=97. In BRENDA, L-alanine is linked to InChIKey QNAYBMKLOCPYGJ-REOHCLBHSA-N.

Hope this is useful

Thanks, Deepa

matthiaskoenig commented 4 years ago

Hi @DeepaMahm, thanks for the input. I will have a look at the resource. Best Matthias

DeepaMahm commented 4 years ago

Hi @matthiaskoenig It is now possible to access BRENDA database using zeep in python 3 which was problematic before. This has been fixed by BRENDA in the last week.

Please check this link https://www.brenda-enzymes.org/soap.php In the list of fields returned in the output,

parameters = ( "j.doe@example.edu",password,"ecNumber*1.1.1.1","organism*Homo sapiens","kmValue*",
              "kmValueMaximum*","substrate*","commentary*","ligandStructureId*","literature*" )

it appears that ligandStructureId can also be obtained. I tried this but all other fields except the ligandStructureId could be obtained for "ecNumber*1.1.1.1", "organism*Homo sapiens".

I have raised this issue to BRENDA again. I will post here if that is working.

Thanks, Deepa

DeepaMahm commented 4 years ago

Hi @matthiaskoenig

This is an update on obtaining ligandStructureId of Brenda compounds, follow-up to the above thread.

The zeep interface of Brenda has been fixed a couple of months back and it's now possible to obtain the following fields (listed in parameters variable below) through query

from zeep import Client
import hashlib

wsdl = "https://www.brenda-enzymes.org/soap/brenda_zeep.wsdl"
password = hashlib.sha256(str("enterpassword").encode('utf-8')).hexdigest()
client = Client(wsdl)
parameters = ("enteremailid", password, "ecNumber*1.1.1.1", "organism*Homo sapiens", "kmValue*",
              "kmValueMaximum*", "substrate*", "commentary*", "ligandStructureId*", "literature*")
resultString = client.service.getKmValue(*parameters)
print(resultString)

Post this, we could map "ligandStructureId*" of Brenda to other compound identifiers like CHEBI or SABIO compound id using the service available here https://www.ebi.ac.uk/unichem/

I hope this would be useful for mapping Brenda compound/substance names.

Thanks, Deepa

DeepaMahm commented 4 years ago

Hi @matthiaskoenig I tried to do the mapping (brenda ligand id to kegg, hmdb and chebi identifiers) via unichem's rest interface. Please check this at your convenience.

DeepaMahm commented 4 years ago
!/usr/bin/python
from zeep import Client
import hashlib

wsdl = "https://www.brenda-enzymes.org/soap/brenda_zeep.wsdl"
password = hashlib.sha256(str("enterpassword").encode('utf-8')).hexdigest()
client = Client(wsdl)
parameters = ("emailid", password, "ecNumber*1.1.1.1", "organism*Homo sapiens", "kmValue*",
              "kmValueMaximum*", "substrate*", "commentary*", "ligandStructureId*", "literature*")
# resultString = client.service.getKmValue(*parameters)
param = ("id", password, "NAD+")

resultString = client.service.getLigandStructureIdByCompoundName(*param)

print(resultString)
b-tierney commented 3 years ago

Hi @matthiaskoenig @DeepaMahm --

Has there been any progress on this particular task? I've bumped into a similar problem, compounded by the fact that the scale I'm stuck working at is substantial enough that it would be ideal/possible to not use zeep or an online client.

Specifically, I'm looking to map BRENDA ligands to their InChIKey, ideally through CHEBI. Are there any available flatfiles now where I can grab BRENDA ligandid > chebi > etc?

Happy to open another issue if I've superseded this particular one.

Thanks so much,

Braden Tierney

maxall41 commented 9 months ago

Mapping structure IDs with Unichem did not work for me as it seems the UniChem BRENDA source wants ligand IDs not group IDs, so it maps to the wrong ligand. I couldn't find another way to do this, so I just ended up writing some code to scrape the BRENDA website to generate SMILES for a given molecule name, and have found that to work far better than anything else I have tried. Code:

import urllib.request
from bs4 import BeautifulSoup
import requests
from rdkit import Chem
import os
from zeep import Client
import hashlib

def get_ligand_smiles_from_name(name):
  try:

    wsdl = "https://www.brenda-enzymes.org/soap/brenda_zeep.wsdl"
    password = hashlib.sha256(str("password!").encode('utf-8')).hexdigest()
    client = Client(wsdl)
    param = ("email@mail.com", password, name)

    resultString = client.service.getLigandStructureIdByCompoundName(*param)

    URL = f"https://www.brenda-enzymes.org/ligand.php?brenda_group_id={resultString}"
    r = requests.get(URL)

    soup = BeautifulSoup(r.content, 'html5lib') # If this line causes an error, run 'pip install html5lib' or install html5lib
    download_button = soup.find('a', attrs = {'class':'download'})

    url = "https://www.brenda-enzymes.org/" + download_button['href'].replace("./","")

    filename = f"{name}.mol"

    urllib.request.urlretrieve(url, filename)

    mol = Chem.MolFromMolFile(filename)
    smiles = Chem.MolToSmiles(mol)
    os.remove(filename)

    return smiles
  except Exception as error:
    print(f"Failed on: {name} with error: {error}")
    return f"Failed - {error}"