hiscom / hispid

HISPID Terms
6 stars 1 forks source link

Property: previousIdentifications #91

Closed afuchs1 closed 8 years ago

afuchs1 commented 9 years ago

Example required for output of multiple identifications.

AaronWilton commented 8 years ago

couldn't find previous documentation - maybe I missed it??? @nielsklazenga - is there something somewhere?

how about something like this using JSON format? (only included three of the fields in the example, but any of the identification group could be added?) ... (note: JSON format used in DWC for dynamicProperties)

[ { "identificationID": "47FDEB45-1949-4501-8CA3-F74CE478F7D5", "verbatimIdentification": "Acacia aneura", "identifiedBy": "John Smith" }, { "identificationID": "0890D9A4-1F04-4869-B4A7-AB1A4448C1E1", "verbatimIdentification": "Acacia sp.", "identifiedBy": "Dave Simpson" } ]

nielsklazenga commented 8 years ago

previousIdentifications is a plain text string, not a JSON array (I fixed the JSON above, by the way). Using JSON defeats the purpose of having previousIdentifications, as you'll just be replicating the Identification History row source. We have been using previousIdentifications in AVH for more than three years, so it shouldn't be that hard to get an example.

Not so easy to find nice complete examples in our database actually. Note the examples above were not delivered as previousIdentifications, but Identification History. I just concatenate them all together, as ALA can't handle Identification History.

AaronWilton commented 8 years ago

so do we have a set order for fields? Which fields are included (all from identification history)? What happens when missing values? these appear to be silently dropped? must make parsing a pain.... :-(

my understanding of this field is that it does duplicate the previousIdentifications when that has to be provided as a single field.... personally I would like to see something more strong delimited that using ?csv-type format - assume if you have comma's in any field then it would need to be enclosed in quotes...

nielsklazenga commented 8 years ago

This is what I do with AVH data:

    private function previousIdentifications($unit) {
        $dets = array();
        $date = array();
        $list = $unit->getElementsByTagName('Identification');
        if ($list->length > 1) { // This skips all Units that have a single Identification
            // (which is assumed to be the current identification).
            foreach ($list as $item) {
                $preferredflag = $item->getElementsByTagName('PreferredFlag');
                if ($preferredflag->length>0 && in_array($preferredflag->item(0)->nodeValue, array('0', 'FALSE', 'false'))) {
                    // There is a preferred flag and it resolves to FALSE.
                    $det = array();
                    $nlist = $item->getElementsByTagName('FullScientificNameString');
                    $det['FullScientificNameString'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;

                    $nlist = $item->getElementsByTagName('IdentificationQualifier');
                    if ($nlist->length > 0) {
                        $det['IdentificationQualifier'] = $nlist->item(0)->nodeValue;
                        $det['IdentificationQualifierInsertionPoint'] = $nlist->item(0)->getAttribute('insertionpoint');
                    }
                    else {
                        $det['IdentificationQualifier'] = FALSE;
                        $det['IdentificationQualifierInsertionPoint'] = FALSE;
                    }

                    $nlist = $item->getElementsByTagName('NameAddendum');
                    $det['NameAddendum'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;

                    $nlist = $item->getElementsByTagName('HybridFlag');
                    if ($nlist->length > 0) {
                        $det['HybridFlag'] = $nlist->item(0)->nodeValue;
                        $det['HybridFlagInsertionPoint'] = $nlist->item(0)->getAttribute('insertionpoint');
                    }
                    else {
                        $det['HybridFlag'] = FALSE;
                        $det['HybridFlagInsertionPoint'] = FALSE;
                    }

                   $nlist = $item->getElementsByTagName('IdentifierRole');
                    $det['IdentifierRole'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;

                    $nlist = $item->getElementsByTagName('IdentifiersText');
                    $det['IdentifiersText'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;

                    $nlist = $item->getElementsByTagName('ISODateTimeBegin');
                    $det['IdentificationDate'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;
                    $date[] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : 'ZZZZ';

                    $nlist = $item->getElementsByTagName('Notes');
                    $det['IdentificationNotes'] = ($nlist->length > 0) ? $nlist->item(0)->nodeValue : FALSE;

                    $dets[] = $det;
                }
            }

            // previous identifications are sorted by identification date
            array_multisort($date, SORT_ASC, $dets); 

            $previousDets = array();
            foreach ($dets as $index => $det) {
                $prev = '';

                // Scientific name
                $sciname = $det['FullScientificNameString'];
                $scinameBits = explode(' ', $sciname);
                if ($det['HybridFlag'] && $det['HybridFlagInsertionPoint']) {
                    $scinameBits = explode(' ', $sciname);
                    $scinameBits[$det['HybridFlagInsertionPoint']-1] = Encoding::toUTF8('×') . $scinameBits[$det['HybridFlagInsertionPoint']-1];
                    $sciname = implode(' ', $scinameBits);
                }
                if ($det['IdentificationQualifier'] && $det['IdentificationQualifierInsertionPoint']) {
                    $scinameBits = explode(' ', $sciname);
                    if ($det['IdentificationQualifierInsertionPoint'] > count($scinameBits))
                        $det['IdentificationQualifierInsertionPoint'] = count($scinameBits);
                    $spacer = ($det['IdentificationQualifier'] == '?') ? '' : ' ';
                    $scinameBits[$det['IdentificationQualifierInsertionPoint']-1] = $det['IdentificationQualifier'] . $spacer . $scinameBits[$det['IdentificationQualifierInsertionPoint']-1];
                    $sciname = implode(' ', $scinameBits);
                }
                if ($det['NameAddendum']) $sciname .= ' ' . $det['NameAddendum'];
                $prev .= $sciname;

                // Determiner
                if ($det['IdentifiersText']) {
                    $prev .= ', ';
                    $prev .= ($det['IdentifierRole'] == 'conf.') ? 'conf. ' : 'det. ';

                    $identifiers = explode(';', $det['IdentifiersText']);

                    $identifier = explode(',', $identifiers[0]);
                    $prev .= (count($identifier) > 1) ? trim($identifier[1]) . ' ' . trim($identifier[0]) : trim($identifier[0]);

                    if (count($identifiers) == 2) {
                        $identifier = explode(',', $identifiers[1]);
                        $prev .= ' & ';
                        $prev .= (count($identifier) > 1) ? trim($identifier[1]) . ' ' . trim($identifier[0]) : trim($identifier[0]);
                    }
                    elseif (count($identifiers) > 2)
                        $prev .= ' et al.';
                }

                // Determination date
                if ($det['IdentificationDate']) {
                    $dateBits = explode('-', $det['IdentificationDate']);
                    $date = '';

                    $day = (isset($dateBits[2])) ? $dateBits[2] : FALSE;

                    $month = FALSE;
                    if (isset($dateBits[1])) {
                        switch ($dateBits[1]) {
                            case '01':
                                $month = 'i';
                               break;

                            case '02':
                                $month = 'ii';
                               break;

                            case '03':
                                $month = 'iii';
                               break;

                            case '04':
                                $month = 'iv';
                               break;

                            case '05':
                                $month = 'v';
                               break;

                            case '06':
                                $month = 'vi';
                               break;

                            case '07':
                                $month = 'vii';
                               break;

                            case '08':
                                $month = 'viii';
                               break;

                            case '09':
                                $month = 'ix';
                               break;

                            case '10':
                                $month = 'x';
                               break;

                            case '11':
                                $month = 'xi';
                               break;

                            case '12':
                                $month = 'xii';
                               break;

                            default:
                                break;
                        }
                    }

                    $year = $dateBits[0];

                    if ($day) 
                        $date = "$day.$month.$year";
                    elseif ($month)
                        $date = "$month.$year";
                    else
                        $date = $year;

                    $prev .= ', ' . $date;

                }

                if ($det['IdentificationNotes']) {
                    $prev .= ' (' . $det['IdentificationNotes'] . ')';
                }

                $previousDets[] = $prev;

            }

            $previousDets = implode('; ', $previousDets);
            if (substr($previousDets, strlen($previousDets)-1, 1) != '.')
                $previousDets .= '.';
            $ret = array (
                'column' => 'previousIdentifications',
                'value' => $previousDets,
            );
            return $ret;

        }

    }

This is, of course, only used when multiple Identifications are provided and previousIdentifications itself is not provided (or empty). When I asked HISCOM at the time, the only feedback I got was from Alison, and we went for readability.

previousIdentifications may have similar semantics to Identification history, but it does not duplicate its syntax or implementation. The content of this element is not meant to be parsed or easily parseable. I think it is not so much has to be as can only be provided in a single field and that has to do with the capabilities of the provider, not those of the consumer, and then you cannot make any requirements as to syntax.

AaronWilton commented 8 years ago

ok - then the definition needs to clearly state this is for human consumption, but I think we should still provide a recommended order of fields and records (following your code above).

We could always use a "identificationHistory" field for a structured concatenation when we want the data to provided for easy machine parsing. Do we add provision for this now?

nielsklazenga commented 8 years ago

Yes, we've got the Identification class.

You are confounding definition with implementation. previousIdentifications is used when the information is delivered as a (unparseable) string. If you deliver the information as individual Identifications with properties, you deliver Identification History. It doesn't matter to the standard whether this is in a separate file or all concatenated into a single field.

Since we pretty much all deliver normalised Identifications to AVH (if we deliver more than the current Identification in the first place), we can agree on a format for previousIdentifications for display in AVH (which we did years ago), but that is AVH, not HISPID.

AaronWilton commented 8 years ago

true!