tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

Indet and morphospecies - Darwin Core Hour Input Form 11/4/2020 15:35:51 #162

Open iDigBioBot opened 3 years ago

iDigBioBot commented 3 years ago

A user submitted this information via the Darwin Core Hour webform: Timestamp: 11/4/2020 15:35:51 Please provide a topic of interest: how to deal with indet and morphospecies in DwC or ABCD Are you capable of and interested in participating: Yes Who else would you recommend to participate in the presentation: Katja Seltmann What resources can you point to: I'd like to know if we have great examples of what the community is doing with these names when trying to map to DwC or ABCD. And, I'd like to know what we should be hoping for / expecting from an aggregator stand-point, when it comes to trying to ingest such names against a backbone that will not contain them (at least for now). This is needed to manage data provider expectations. Your name: Deborah Paul Your email: dlpaul@illinois.edu Your GitHub username: @debpaul

tucotuco commented 3 years ago

@bpescador has posed questions along the same lines in email very recently.

I am still not clear on how to map identifications that are not valid species. Below I use an example from Antweb. I understand that "cf" should be pulled out and added to identificationQualifier. But how do I handle the morphospecies. Can you give an example? For reference, in the paper (https://doi.org/10.1371/journal.pone.0218904), they mapped their morphospecies also to identificationQualifier but I don't see how that will work.

Ex. Family subfamily genus species comment
1 Formicidae (Formicidae) (Formicidae) (indet) identified only to family
2 Formicidae Amblyoponinae (Amblyoponinae) (indet) identified only to subfamily
3 Formicidae Amblyoponinae Amblyopone (indet) identified only to genus, do not use sp. or undet to indicate indetermined to species.
4 Formicidae Amblyoponinae Adetomyrma MG01 morphospecies code for Madagscar region
5 Formicidae Amblyoponinae Amblyopone AFRC-TZ01 morphospecies code for the AFRC collection
6 Formicidae Formicinae Formica planipilis_cf To indicate uncertainty in a name, append _cf or _nr following the name.  Use _cf instead of Formica planipilis?
7 Formicidae Ponerinae Anochetus graeffei_nr _cf for taxa that close to or conspecific to named taxon and _nr for taxa that are close to named taxon but not conspecific.
8 Formicidae Myrmicinae Stenamma punctatoventre_cfCA01  
9 Formicidae Myrmicinae Stenamma punctatoventre_cfCA02 punctatoventre_cfCA02 is distinct from punctatoventre_cfCA01
10 Formicidae Cerapachyinae Cerapachyine_genus1 MG01 an unnamed genus in Cerapachyinae, unnamed species in Madagascar
tucotuco commented 3 years ago

The reference Brian gives shows what one community is doing.

I am not qualified to speak about what any other communities are doing, but here's my take on what should be done in Darwin Core.

The term dwc:scientificName is supposed to contain full scientific names, ergo, not morphospecies. Instead, scientificName should contain the name of the lowest rank at which the organism can be identified. The rest should be relegated to the identificationQualifier term.  The morphospecies would then be recoverable from the combination of the lowest populated rank term (assuming it is genus or lower) plus a space plus the identificationQualifier. In addition, a complete morphospecies name can be provided in the term dwc:previousIdentifications. Just because its label is Previous Identifications doesn't mean that the current identification can't be one of them, especially in this case where the specimens may well get a full scientific name some day.

Thus, based on Brian's examples, here are the mappings I suggest for Darwin Core terms:

  1. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae', scientificName - 'Formicidae' or better yet 'Formicidae Latreille, 1809', family - 'Formcidae', genus - [empty], specificEpithet - [empty], identificationQualifier - 'indet.', previousIdentifications - 'Formicidae'.

  2. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Amblyoponinae' | , scientificName - 'Amblyoponinae' or better yet 'Amblyoponinae Forel, 1893', family - 'Formcidae', genus - [empty], specificEpithet - [empty], identificationQualifier - 'indet.', previousIdentifications - 'Amblyoponinae'.

  3. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Amblyoponinae | Amblyopone', scientificName - 'Amblyopone' or better yet 'Amblyopone Erichson, 1842', family - 'Formcidae', genus - 'Amblyopone', specificEpithet - [empty], identificationQualifier - [empty] or 'indet.', previousIdentifications - 'Amblyopone'.

  4. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Amblyoponinae | Adetomyrma', scientificName - 'Adetomyrma' or better yet 'Adetomyrma Ward, 1994', family - 'Formcidae', genus - 'Adetomyrma', specificEpithet - [empty], identificationQualifier - 'MG01', previousIdentifications - 'Adetomyrma MG01'.

  5. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Amblyoponinae | Amblyopone', scientificName - 'Amblyopone' or better yet 'Amblyopone Erichson, 1842', family - 'Formcidae', genus - 'Amblyopone', specificEpithet - [empty], identificationQualifier - 'AFRC-TZ01', previousIdentifications - 'Amblyopone AFRC-TZ01'.

  6. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Formicinae | Formica', scientificName - 'Formica planipilis' or better yet 'Formica planipilis Creighton, 1940', family - 'Formcidae', genus - 'Formica', specificEpithet - 'planipilis', identificationQualifier - 'cf. planipilis', previousIdentifications - 'Formica cf. planipilis'.

  7. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Ponerinae | Anochetus', scientificName - 'Anochetus graeffei' or better yet 'Anochetus graeffei Mayr, 1870', family - 'Formcidae', genus - 'Anochetus', specificEpithet - 'graeffei', identificationQualifier - 'nr. graeffei', previousIdentifications - 'Anochetus nr. graeffei'.

  8. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Myrmicinae | Stenamma', scientificName - 'Stenamma punctatoventre' or better yet 'Stenamma punctatoventre Snelling, 1973', family - 'Formcidae', genus - 'Stenamma', specificEpithet - 'punctatoventre', identificationQualifier - 'cf. CA01', previousIdentifications - 'Stenamma punctatoventre cf. CA01.

  9. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Myrmicinae | Stenamma', scientificName - 'Stenamma punctatoventre' or better yet 'Stenamma punctatoventre Snelling, 1973', family - 'Formcidae', genus - 'Stenamma', specificEpithet - 'punctatoventre', identificationQualifier - 'cf. CA02', previousIdentifications - 'Stenamma punctatoventre cf. CA02.

  10. higherClassification - 'Animalia | Arthropoda | Insecta | Hymenoptera | Formicidae | Cerapachyinae', scientificName - 'Cerapachyinae' or better yet 'Cerapachyinae Forel, 1893', family - 'Formcidae', genus - [empty] specificEpithet - [empty], identificationQualifier - 'gen. Cerapachyine_genus1 sp. MG01', previousIdentifications - 'Cerapachyine_genus1 MG01.

dimus commented 3 years ago

@tucotuco, I am worried with this approach that the ScientificName term in such cases contains elements that are quite different semantically and structurally: 'normal' scientific name strings as well as 'surrogate' name strings, however they might look quite similar. The identificationQualifier seems to be a free form, and would require parsing effort to qualify names correctly. On top of that, there is no "linter" for DwC that would enforce uniform placement of elements, so in the wild many different overlapping approaches would happen, even, sometimes in one dataset. I know you do not like the idea of a 'verbatim' field (something like ScientificNameString for example), but that would help people like me, as I can use gnparser to break names into categories for my projects.

tucotuco commented 3 years ago

I have nothing against a verbatim field, and agree that it would help with this issue. There is an open request for a new term 'verbatimScientificName' which is stalled at the stage of needing evidence for demand.

https://github.com/tdwg/dwc/issues/181

On Fri, Nov 6, 2020 at 6:07 PM Dmitry Mozzherin notifications@github.com wrote:

@tucotuco https://github.com/tucotuco, I am worried with this approach that the ScientificName term in such cases contains elements that are quite different semantically and structurally: 'normal' scientific name strings as well as 'surrogate' name strings, however they might look quite similar. The identificationQualifier seems to be a free form, and would require parsing effort to qualify names correctly. On top of that, there is no "linter" for DwC that would enforce uniform placement of elements, so in the wild many different overlapping approaches would happen, even, sometimes in one dataset. I know you do not like the idea of a 'verbatim' field (something like ScientificNameString for example), but that would help people like me, as I can use gnparser to break names into categories for my projects.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc-qa/issues/162#issuecomment-723299920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727YKIAKVYCSC7IRMT3SORQRJANCNFSM4TKQKC7Q .

tammyhorton commented 3 years ago

We have begun working on this and have just submitted a paper to Frontiers which makes some suggested recommendations on how to deal with the morphospecies in Darwin Core fields. We focus on the application of morphotaxa in deep-sea image analysis where we come across numerous incomplete IDs and need to make use of open nomenclature (ON). It was clear to us in the deep-sea field, that clarification of how to structure these names and improve consistency in Darwin Core was needed. We recommended the following:

scientificName should be at the lowest possible taxonomic rank, preferably at species level or lower, but higher ranks, such as genus, family, order, class etc. are also acceptable.

The scientificName term should only contain the name and not identification qualifications.

Incorporation of ON signs in the field identificationQualifier,

Inclusion of remarks about an identification in the identificationRemarks field .

Darwin Core also includes a taxonConceptID field, defined as “An identifier for the taxonomic concept to which the record refers - not for the nomenclatural details of a taxon” (dwc.tdwg.org/terms/), that can be used to form a namestring that combines scientificName and identificationQualifier. In the case of non-code compliant names, the use of a namestring or coding that is a combination of the scientificName and the identificationQualifier similarly becomes the taxonConceptID and therefore we recommend this field for these entries.

bpescador commented 3 years ago

@tammyhorton In your system does dwc:taxonRank = the rank of the taxonConceptID field (code compliant + non-code compliant names) or just the rank of the code-compliant portion. Also curious if you have a specimen identified at only a higher rank (eg family), does taxonConceptID field = scientificName

tammyhorton commented 3 years ago

@bpescador since for upload via OBIS we need to ensure that the marine taxa are already available in WoRMS, when a valid scientificNameID is provided the taxonRank will correspond to the rank of the scientificName. See OBIS manual and the example: https://obis.org/manual/darwincore/#occurrence

scientificName scientificNameAuthorship scientificNameID taxonRank identificationQualifier


Lanice conchilega Pallas, 1766 urn:lsid:marinespecies.org:taxname:131495 species Gadus Linnaeus, 1758 urn:lsid:marinespecies.org:taxname:125732 genus cf. morhua

So where we have just ID to Genus (or Family or higher) the Taxon rank is the same as is input for scientificName