gshaw / notes

Issues and solutions I find during software development.
https://gshaw.ca
MIT License
1 stars 0 forks source link

How to get conservation status of a species using WikiData #16

Open gshaw opened 1 year ago

gshaw commented 1 year ago
  1. Use the REST API to get the Wikipedia article for the species using the scientific name. E.g., https://en.wikipedia.org/api/rest_v1/page/summary/Branta_canadensis
  2. Parse the JSON response for the wikibase_item value. This will be a string starting with Q, e.g., Q26733.
  3. Request the WikData using the REST API for the specific item.
  4. Parse the JSON response looking for the "statement" of interest. The IUCN conservation status property code is P141.
  5. Look at the contents of that statement. It will be an array but there should only be 1 element. Look at the value dictionary for the content key. This will be another Q string followed by a number. E.g., Q211005. You can look up details for any property using the WikiData, .e.g, https://www.wikidata.org/wiki/Q211005
gshaw commented 1 year ago

Wikipedia REST API for Canada Goose: https://en.wikipedia.org/api/rest_v1/page/summary/Branta_canadensis

{
  "type": "standard",
  "title": "Canada goose",
  "displaytitle": "<span class=\"mw-page-title-main\">Canada goose</span>",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q26733",
  "titles": {
    "canonical": "Canada_goose",
    "normalized": "Canada goose",
    "display": "<span class=\"mw-page-title-main\">Canada goose</span>"
  },
  "pageid": 218972,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/40/Canada_goose_on_Seedskadee_NWR_%2827826185489%29.jpg/320px-Canada_goose_on_Seedskadee_NWR_%2827826185489%29.jpg",
    "width": 320,
    "height": 243
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/4/40/Canada_goose_on_Seedskadee_NWR_%2827826185489%29.jpg",
    "width": 4281,
    "height": 3256
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "1157644946",
  "tid": "90c754b0-0837-11ee-bf11-b9c3f5ef2b97",
  "timestamp": "2023-05-30T01:28:18Z",
  "description": "Species of goose native to the Northern Hemisphere",
  "description_source": "local",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Canada_goose",
      "revisions": "https://en.wikipedia.org/wiki/Canada_goose?action=history",
      "edit": "https://en.wikipedia.org/wiki/Canada_goose?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Canada_goose"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Canada_goose",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Canada_goose",
      "edit": "https://en.m.wikipedia.org/wiki/Canada_goose?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Canada_goose"
    }
  },
  "extract": "The Canada goose, sometimes called Canadian goose, is a large wild goose with a black head and neck, white cheeks, white under its chin, and a brown body. It is native to the arctic and temperate regions of North America, and it is occasionally found during migration across the Atlantic in northern Europe. It has been introduced to the United Kingdom, Ireland, Finland, Sweden, Denmark, New Zealand, Japan, Chile, Argentina, and the Falkland Islands. Like most geese, the Canada goose is primarily herbivorous and normally migratory; often found on or close to fresh water, the Canada goose is also common in brackish marshes, estuaries, and lagoons.",
  "extract_html": "<p>The <b>Canada goose</b>, sometimes called <b>Canadian goose</b>, is a large wild goose with a black head and neck, white cheeks, white under its chin, and a brown body. It is native to the arctic and temperate regions of North America, and it is occasionally found during migration across the Atlantic in northern Europe. It has been introduced to the United Kingdom, Ireland, Finland, Sweden, Denmark, New Zealand, Japan, Chile, Argentina, and the Falkland Islands. Like most geese, the Canada goose is primarily herbivorous and normally migratory; often found on or close to fresh water, the Canada goose is also common in brackish marshes, estuaries, and lagoons.</p>"
}
gshaw commented 1 year ago

Getting WikiData for Canada Goose Q26733 (see previous comment). https://www.wikidata.org/w/rest.php/wikibase/v0/entities/items/Q26733

Look in (large) JSON response for WikiData values for P141 Conservation Status. The following snippet shows only one of dozens attached to the item with some keys removed. Unfortunately the statements are not sorted by property key. Inspect the value content key to get Q211005 which is the value for Least Concern. Phew!

"P141": [
  {
    "id": "Q26733$47CFEEBD-8A6E-4AA3-8BBA-AC8379A6DCD9",
    "rank": "normal",
    "qualifiers": [],
    "references": [
    ],
    "property": {
    },
    "value": {
      "type": "value",
      "content": "Q211005"
    }
  }
]
Lower Risk
Threatened
Extinct
Other

There should be one for NE Not Evaluated but I'm not sure what it is.

gshaw commented 1 year ago

Other interesting WikiData properties related to birds.

gshaw commented 1 year ago

My approach doesn't quite work because of differences in scientific names used by eBird, Wikipedia, Wikidata, and IUCN. E.g., some scientific names used by eBird are different than IUCN. I wrote a script to import the conservation status from IUCN into the scientific names used by eBird but that causes a lot of conflicts because Wikidata only allows one record to have an IUCN taxon ID. This required me to fix 100s of conflicts via script and a bit by hand.

It's cool how easy it is to jump in to contribute to Wikidata but it's also difficult if you don't aren't sure who to ask for guidance. I'm still left uncertain on how to proceed down this route. Ultimately I think leveraging Wikidata is the right thing to do but I'm not sure the best way to handle the differences in eBird as this time. Giving it more time as this isn't on the critical path.

Reference script code and run output: https://gist.github.com/gshaw/a326394f260def9bef66765b0959f89a

Also, shout out to the wikibase-cli tool. Fantastic for scripting as the Wikibase API isn't trivial to use to modify records.