petermr / dictionary

Collection of Wikidata-based dictionaries for scientific annotation and searching
Apache License 2.0
7 stars 4 forks source link

Requirements for Organization Dictionary #3

Open ShweataNHegde opened 3 years ago

ShweataNHegde commented 3 years ago

The Current Version of the Dictionary

The organization dictionary needs updates. Here is the list of requirements:

  1. We would want a DictionaryEditor to reference "country" as a related item. Something similar to this:
    <entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
    <synonym>Samsung chaebol</synonym>
    <synonym>Samsung Group</synonym>
    <related role="country" wikidataID="Q884">South Korea</related>
    <related role="crossrefid" wikidataID="">100004358</related>  </entry>
  2. There are duplicate entries in the dictionary. For example, if an organization has two or more CrossRef IDs, each of them gets a separate entry in the dictionary. We would like to have a Python tool which goes through the dictionary looking for entries with same Wikidata ID and merges into one.
petermr commented 3 years ago

To add the query into dictionary:

# Organization
SELECT ?OrganizationLabel ?Country ?CountryLabel ?instanceofLabel  ?Organization ?crossrefid 

WHERE {
  ?Organization wdt:P3153 ?crossrefid .
OPTIONAL  {?Organization wdt:P31 ?instanceof .}
  ?Organization wdt:P17 ?Country .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 20000000
petermr commented 3 years ago

This has a RESTful API/URL: with URLencoding


(broken up for readability - don't use this)

https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3FOrganizationLabel%20%3F Country%20%3FCountryLabel%20%3FinstanceofLabel%20%20%3FOrganization%20%3F crossrefid%20%0A%0AWHERE%20%7B%0A%20%20%3FOrganization%20wdt%3AP3153%20%3F crossrefid%20.%0AOPTIONAL%20%20%7B%3FOrganization%20wdt%3AP31%20%3F instanceof%20.%7D%0A%20%20%3FOrganization%20wdt%3AP17%20%3F Country%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3A serviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000


We can add this to the dictionary : suggestion
https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3FOrganizationLabel%20%3FCountry%20%3FCountryLabel%20%3FinstanceofLabel%20%20%3FOrganization%20%3Fcrossrefid%20%0A%0AWHERE%20%7B%0A%20%20%3FOrganization%20wdt%3AP3153%20%3Fcrossrefid%20.%0AOPTIONAL%20%20%7B%3FOrganization%20wdt%3AP31%20%3Finstanceof%20.%7D%0A%20%20%3FOrganization%20wdt%3AP17%20%3FCountry%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000
petermr commented 3 years ago

Can also use a shortened query:

https://w.wiki/p3k

We don't know how persistent this will be

petermr commented 3 years ago

Should also add the WD property for the related items

<entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
  <synonym>Samsung chaebol</synonym>
  <synonym>Samsung Group</synonym>
  <related roleWikidataID="P17" role="country" wikidataID="Q884">South Korea</related>
   <related roleWikidataID="P3153" role="crossrefid" wikidataID="">100004358</related>  </entry>
petermr commented 3 years ago

This extends to animal hosts for zoonosis

(Mockup please change to correct values

<entry  description="COVID-19" name="COVID-19" term="COVID-19" wikidataURL="http://www.wikidata.org/entity/Q00000" wikipediaURL="https://en.wikipedia.org/wiki/COVID0000" wikidataID="Q00000">
  <synonym>COVID-Sars2-Cov</synonym>
  <related roleWikidataID="P2975" role="host" wikidataID="Q2000632">Rhinolophus Bat</related>

start of query:

# Organization
SELECT ?organism ?organismLabel ?host ?hostLabel 

WHERE {
  ?organism wdt:P2975 ?host .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 20000000

and in REST:

https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3Forganism%20%3ForganismLabel%20%3Fhost%20%3FhostLabel%20%0A%0AWHERE%20%7B%0A%20%20%3Forganism%20wdt%3AP2975%20%3Fhost%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000
petermr commented 3 years ago

Searching for related in dictionaries

<entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
  <synonym>Samsung chaebol</synonym>
  <synonym>Samsung Group</synonym>
  <related roleWikidataID="P17" role="country" wikidataID="Q884">South Korea</related>
   <related roleWikidataID="P3153" role="crossrefid" wikidataID="">100004358</related>  </entry>

"All organizations in South Korea" We will use XPath

"all entry with related child with role of P17 and wikidataID of Q884 "

XPath:

/*/entry[related[@roleWikidataID='P17' and @wikidataID='Q884']]"

Python elementTree has LIMITED XPath

petermr commented 3 years ago

requirements arising from old dictionaries (automation)