petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

AltLabel with different delimiter #94

Open ShweataNHegde opened 3 years ago

ShweataNHegde commented 3 years ago

We want to be able to get AltLabels from Wikidata without the default delimiter 'comma'. The default delimiter is a problem, especially with plantCompound dictionary because the IUPAC names often have commas, which makes it harder for us to get the synonyms out.

ShweataNHegde commented 3 years ago

I found a StackOverFlow solution for this. https://stackoverflow.com/questions/46850562/how-to-query-wikidata-for-also-known-as

Here is an example:

SELECT ?compound ?compoundLabel ?compoundDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = " | ") AS ?altLabel_list) WHERE {
 VALUES ?compound {
   wd:Q225543 wd:Q416114
}
    OPTIONAL { ?compound skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
 }
GROUP BY ?compound ?compoundLabel ?compoundDescription

Try it here

By using concatenation, we can customize the delimiter. Here is how the SPARQL endpoint looks like:

<?xml version="1.0" encoding="UTF-8"?>

-<sparql xmlns="http://www.w3.org/2005/sparql-results#">

-<head>

<variable name="compound"/>

<variable name="compoundLabel"/>

<variable name="compoundDescription"/>

<variable name="altLabel_list"/>

</head>

-<results>

-<result>

-<binding name="compound">

<uri>http://www.wikidata.org/entity/Q225543</uri>

</binding>

-<binding name="compoundLabel">

<literal xml:lang="en">carvacrol</literal>

</binding>

-<binding name="compoundDescription">

<literal xml:lang="en">chemical compound</literal>

</binding>

-<binding name="altLabel_list">

<literal>Carvacrol | 1-Hydroxy-2-methyl-5-isopropylbenzene | 1-Methyl-2-hydroxy-4-isopropylbenzene | 2-Hydroxy-4-isopropyl-1-methylbenzene | 2-Hydroxy-p-cymene | 2-Hydroxycymene | 2-Methyl-5-(1-methylethyl)-Phenol | 2-Methyl-5-(1-methylethyl)phenol | 2-Methyl-5-isopropylphenol | 2-p-Cymenol | 3-Isopropyl-6-methyl-Phenol | 3-Isopropyl-6-methylphenol | 5-Isopropyl-2-methyl-Phenol | 5-Isopropyl-2-methylphenol | 5-Isopropyl-o-cresol | 6-Methyl-3-isopropylphenol | Antioxine | BENZENE,2-HYDROXY,4-ISOPROPYL,1-METHYL CARVACROL | Cymenol | Cymophenol | FEMA 2245 | Hydroxy-p-cymene | Isopropyl-O-cresol | Isothymol | Isothymol (=2-Isopropyl-4-methyl phenol) | Karvakrol | Methyl-5-(1-methylethyl)phenol | O-Thymol | Oxycymol | p-Cymen-2-ol | p-Cymene-2-ol | p-Mentha-1,3,5-trien-2-ol</literal>

</binding>

</result>

-<result>

-<binding name="compound">

<uri>http://www.wikidata.org/entity/Q416114</uri>

</binding>

-<binding name="compoundLabel">

<literal xml:lang="en">(+/-)-4-terpineol</literal>

</binding>

-<binding name="compoundDescription">

<literal xml:lang="en">chemical compound</literal>

</binding>

-<binding name="altLabel_list">

<literal>(+-)-p-Menth-1-en-4-ol | 1-Isopropyl-4-methyl-3-cyclohexen-1-ol | 1-isopropyl-4-methylcyclohex-3-en-1-ol | 1-Menthene-4-ol | 1-Methyl-4-isopropyl-1-cyclohexen-4-ol | 1-p-Menthen-4-ol | 1-para-Menthen-4-ol | 1-Terpinen-4-ol | 4-Carvomenthenol | 4-Methyl-1-(1-methylethyl)-3-cyclohexen-1-ol | 4-Methyl-1-isopropyl-3-cyclohexen-1-ol | 4-Terpineol | alpha -Terpinen-4-ol | alpha-terpinen-4-ol | FEMA 2248 | Origanol | p-Menth-1-en-4-ol | Terpene-4-ol | Terpin-4-en-1-ol | Terpinen-4-ol | Terpinene-4-ol | Terpinenol-4 | Terpineol-4</literal>

</binding>

</result>

</results>

</sparql>