Closed acvaughan closed 8 years ago
I have labeled this with 'new term', but there already is a term in the HISPID extension (see http://hiscom.rbg.vic.gov.au/wiki/HISPID_5_for_HISPID_Users#NameFormula). However, the structure of that element is completely different from what is suggested in the minutes of the 2012 minutes.
I would like to know first how many collections databases still store taxon names like this. I think most don't and for those herbaria it will be very hard or impossible to deliver atomised hybrid formulae (and others will refuse to do so). I also think that even when hybrid formulae (or scientific names for that matter) are stored in atomised form, the best exchange format is still the concatenated string, as long as it is done consistently.
Since the 2012 HISCOM meeting I have discovered that there is an Extension element under Identification in ABCD as well (//element(*,Identification)/Result/Extension). It has always been there, I just hadn't looked hard enough before. We (MEL and MELU) currently use it to deliver identificationID and taxonRank. We could have a HISPID Identification extension as well, if we can think of anything that should be in there (I am not really a fan of this one).
Having looked at the hybrid names and the initial requirement which is to provide atomised names where possible (ie. if your database can) then I think the way forward is to describe a separate set of elements that can be repeated in such a way that each component of the name can be delivered. The full unadulterated name would still be in
name
I've added an Excel file with examples to the GIT repository hispid-review-2014-15/HISPIDreview-hybridNames.xlsx
I'm not sure if Neils suggestion of providing parent1, parent2 still stands because in 'named' hybrids when a new name is coined the parentage should be available as part of the publication of the name, but as many names are unpublished maybe... for hybrid formula's the proposed structure can handle the name elements of each parent.
Over for discussion...
The difference between my solution and yours is that your solution seems to be about parsing up the hybrid name or formula, while in my solution the hybrid parents are taxa, not name elements. So your solution might fit better in ABCD, while mine fits better with Darwin Core (and, in my view, the HISPID we are trying to make).
All hybrids have two parent taxa. Named hybrids also have two parents, although you may only know one of them (one has to be known, or you can't have a nothotaxon). With named hybrids, hybridParent1 should never be the same as the scientificName. Also hybrids with more than two names in the formula have two parents, one of them (at least) being a hybrid itself. you probably don't know which one, so I wouldn't bother identifying the parents.
I don't think there is a requirement to provide atomised names. Atomised names are a pain.
Why can't a HISPID extension accommodate both? Parent1/Parent2 certainly provides data that is useful and fits into the DwC model, but the original requirement was to atomise the names in a way that is useful for ABCD consumers. The second model would certainly make this simpler. Also when providing data to ITF then we need to atomise the names.
ABCD doesn't support atomised hybrid formulas, so there is no way to atomise names in a way that is useful for ABCD consumers. The best thing to do to support ABCD consumers and consumers who have hybrid formulas atomised in their database is to format the hybrid formula consistently. That's why, when HISPID was mapped against ABCD all the hybrid-related elements were removed. This time we decided to separate the standard from the transfer format.
My solution is really pretty much the same as yours when you try to fit your solution into HISPID. hybridParent1 and hybridParent2 are hooks, or references to full taxon records. When exchanged these could be the scientific names of the taxa, but in other implementations it could just as well be the identifiers for these taxa, as it is in our database, or the atomised names, as it is in your solution. I could have added hybridParent1ID and hybridParent2ID – which is similar to what you would need to do, as otherwise you can't have repeatable groups of elements – but so far we haven't included identifiers for taxa or scientific names in HISPID. What we are doing with HISPID is about definition, not so much implementation. Your solution is just one implementation of my solution and so is ITF 2 (although that has seniority; my solution is more a generalisation of what ITF 2 does and what HISPID used to do). In an ABCD-ish implementation, hybridParent1 and hybridParent2 would have the abcd:TaxonIdentified data type. It is similar to what we have done with previousIdentifications, where in the usage notes we say that it is actually better to deliver the entire identification history as nested elements (both ABCD and Darwin Core implementations allow for both; I don't think we should say that, by the way). So, when implemented in ABCD, each previousIdentification is an element with data type abcd:Identification.
The real difference between the two solutions is (1) that my solution only allows for two hybrid parents, while in your solution it can be any number, and (2) that for named hybrids in your solution hybridParent1 is the same as the hybridChild, while in my solution it can't be.
why are we limited to 2 hybrid parents...
We have specimens that are artificial hybrids with known parentage - e.g., https://scd.landcareresearch.co.nz/Specimen/CHR_531488 (note that the data entry on this specimen is a mess from 2 database systems ago)
which has: (Epilobium billardiereanum (F) × E. nerteroides) × E. billardiereanum does this mean that HybridParent1 would be itself a hybrid formulae.
Should we be thinking about hybridFlag and hybridParent (without number)? hybridParent being repeated as often as required??? (not very DWC-like) Do we need to consider terms for hybridParentRole (i..e, female parent, male parent??)
I am NOT in favour of providing atomised names.
Already answered that. Hybrids, as everything else, can only ever have two parents.
My favourite solution is still to get rid of all the hybrid fields (does that mean we agree?). If we are not going to provide atomised names, why provide atomised formulas? Especially since the formula does a better job of conveying all the information than the atomised form ever will.
so it seems the general consensus is parent1, parent2 and no atomised components. I still think that a structure for atomised names could be described and populated which means users who can consume names like this could but I agree that DwC and ABCD make this difficult.
Hobart, 2015-10-20: Resolved to remove all hybrid-related terms and deal with this in the scientific name string.
Hobart 2015-10-23: Agreed to remove hybridParent1 and hybridParent2, but retain hybridFlag
removed from terms - history to be documented in #98
HISCOM wants to be able to deliver atomised hybrid formulas, as well as the full scientific name of the hybrid's parents. Although this won’t be used by ALA, it will be useful for direct harvesting from other BioCASe providers. For this to work, we need the hybrid fields to be added to the identification element of ABCD, and not just in the HISPID extension. (From minutes of the 2012 HISCOM meeting: http://hiscom.rbg.vic.gov.au/wiki/HISCOM_2012_AGM_Canberra_minutes#11._HISPID).