rdmpage / australia

Collaborations in Australia
0 stars 0 forks source link

Adding taxonomic literature to ALA (and/or provider databases). #6

Open rdmpage opened 5 years ago

rdmpage commented 5 years ago

Most large biodiversity databases treat literature as a second class citizen, and often don't include the literature that published a name, or if they do, they don't include a digital identifier (such as a DOI) or a link to a digital version of the paper (e.g., in a repository or in BHL). ALA is an example of this lack of linking.

Through projects such as BioNames and Ozymandias, many taxonomic publications are know to have identifiers, so we would like to add these to ALA (and anyone else who would like them). This would enable visitors to ALA to be able to click on a link and go straight to the publication, in other words, the evidence behind a particular taxonomic decision. The question is how do we do this?

rdmpage commented 5 years ago

One approach to bootstrap this is to create a DarwinCore Archive version of the augmented AFD dataset used to populate Ozymandias. Generate basic taxonomy dump and add references (just like The Plant List). Maybe restrict it to just references with a DOI. Use GBIF references extension to encode references. Upload to GBIF, then use as a demo to ALA.

Note that AFD is CC-BY according to the ALA page for Australian Faunal Directory.

The AFD taxon identifiers are often included in GBIF occurrences, e.g. https://www.gbif.org/occurrence/1916628045 has urn:lsid:biodiversity.org.au:afd.taxon:2634171f-c9e7-4f4a-9774-35ec20210765 for Rhipidura leucophrys (see also https://ozymandias-demo.herokuapp.com/uri/https://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:afd.taxon:2634171f-c9e7-4f4a-9774-35ec20210765 in Ozymandias). So we can use these identifiers in the literature dump (although for reasons that surpass understanding, ALA doesn't use the AFD name identifiers).

rdmpage commented 5 years ago

@nicolekearney Has created a collection of Australian Herbarium literature on BHL: https://www.biodiversitylibrary.org/collection/AustralianHerbariaJournals

rdmpage commented 5 years ago

Demo of possible ALA use of literature https://ozymandias-demo.herokuapp.com/alademo.php?q=Maricoccus+brucei+Poore%2C+1994 Need to format better add sources, and provide examples.

nicolekearney commented 5 years ago

Examples of species descriptions in Australian Museum journals to be used as examples for https://ozymandias-demo.herokuapp.com

Records of the Western Australian Museum: https://ozymandias-demo.herokuapp.com/alademo.php?q=Diplodactylus+nebulosus+Doughty+%26+Oliver%2C+2013

Memoirs of Museums Victoria https://ozymandias-demo.herokuapp.com/alademo.php?q=Paraulopus+balteatus+Gomon%2C+2010

Memoirs of the Queensland Museum: https://ozymandias-demo.herokuapp.com/alademo.php?q=Macropanesthia+intermorpha+Rose%2C+Walker+%26+Woodward%2C+2014

Records of the Australian Museum: https://ozymandias-demo.herokuapp.com/alademo.php?q=Pallidelix+Iredale%2C+1933

nicolekearney commented 5 years ago

Mock up of possible layout for proposed changes to ALA names tab (adding links to literature from existing citations) Linking ALA taxon names to literature .pdf

rdmpage commented 5 years ago

I’ve tweaked the tool to resembled you mock-up more closely, and added some examples. Do you think this is good enough to send out to other folks?

rdmpage commented 5 years ago

Live demo here: https://ozymandias-demo.herokuapp.com/alademo.php

Example to try: https://ozymandias-demo.herokuapp.com/alademo.php?q=Pauropsalta+herveyensis+Owen+%26+Moulds%2C+2016

rdmpage commented 5 years ago

APNI ids are in Wikidata (via Greg Whitbread), e.g. https://www.wikidata.org/wiki/Q17400729 Solanum eardleyae.

rdmpage commented 5 years ago

From an email from Anne Fuchs (environment.gov.au) regarding persistence of AFD identifiers.

After doing some investigation into AFD identifiers I have found the following:-

  • Every time a name or taxon is ‘updated’ in AFD a new UUID is generated. An update seems to occur each time a branch of the tree is checked back into AFD. The UUID is used as part of the LSID exposed in the ALA ie. urn:lsid:biodiversity.org.au:afd.taxon:36752d3b-1d5f-4517-812b-cd52c81f8785
  • At some point an effort was made to generate a stable name id but this does not seem to being provided and/or used in the ALA. I am not sure if ‘names’ are exposed in anyway so I will follow this up.
  • The formats afd.name, afd.taxon and afd.publication were part of the TCS(XML) and SPARQL(RFD) prototype that was done in approximately 2014. From what I can glean the ALA wanted to work with Darwin Core files and so most of this work was taken up by only a few consumers – yourself included! Unfortunately when we moved our infrastructure to CSIRO and with the rewrite of the NSL for APNI/APC these interfaces were not kept up to date and although the SPARQL interface still runs the data has not been updated in some time.
  • To try and address this and the other resolution issues
  • The afd.name format (http://biodiversity.org.au/afd.name/323792) now at least resolves to an AFD page (but only if it is the id of the previously published persistent id for this name from the TCS/SPARQL work, other name ids or UUID’s do not resolve)
  • The requests for https://biodiversity.org.au/afd/taxa/36752d3b-1d5f-4517-812b-cd52c81f8785 OR https://biodiversity.org.au/afd.taxon/36752d3b-1d5f-4517-812b-cd52c81f8785 do work and redirect to the AFD page for the current taxon
  • We need to make the linked data services for the NSL (and AFD) work properly.
  • We need to make exports available for AFD with the persistent identifiers included, as an alternative while the mapping and linked data work is being done