hiscom / hispid

HISPID Terms
6 stars 1 forks source link

Verbatim vs interpreted data #78

Closed ben3000 closed 8 years ago

ben3000 commented 9 years ago

There is no indication in PERTH's specimen database whether a field (column, element) is allowed to be changed (interpreted data) or not (verbatim data; that supplied by the collector).

There is an acknowledged policy at PERTH of making no change to a verbatim field to maintain historical accuracy, even if there is data in the field that is clearly no longer correct.

I personally favour a clear separation of verbatim and interpreted fields to clarify the maintenance of specimen data. Thus, I would like to see the names of verbatim elements in HISPID terms contain the word "verbatim", thus hispid:verbatimDateIdentified (which already exists). However, Darwin Core uses verbatimLocality where we use locality, for example.

ben3000 commented 9 years ago

My bad, we do have a separate verbatimLocality, feel free to ignore that note.

nielsklazenga commented 9 years ago

+1. I think we adopted all the Darwin Core 'verbatim' terms, but it is important to highlight that we have both verbatim and interpreted fields and that they need to be kept apart. At MEL we have the same policy, but it is not followed by most people.

AaronWilton commented 9 years ago

Why is this an issue for HISPID? We need to provide for transfer of both standard and verbatim fields where appropriate. How these are called in local databases doesn't matter as long as they are mapped to the correct concept/field when transferred...

nielsklazenga commented 9 years ago

If verbatim and interpreted data is separated in HISPID, but not in the databases from which data is harvested, you can't correctly map the field in the database to a HISPID term. Pretty big issue.

ben3000 commented 9 years ago

Yes, I'm flagging that the trouble is mostly at my end, but that clarity in the terms we define will help.

nielsklazenga commented 9 years ago

I have the same issue here, so I expect that it is a general issue.

ben3000 commented 9 years ago

I'm also figuring that it is really just a big picture/policy agreement thing that this is how herbaria should cope with the competing need for historical accuracy and rigorous data management.

ben3000 commented 9 years ago

After a chat with Karina here, it seems collector-originated data is already changed, if required, during data entry and also later when an error is discovered. Also, the database is the acknowledged primary source rather than the label, as labels are only reprinted when a major change is made in the database. It would be good to get a snapshot of actual herbarium workflow around Australasia during the HISCOM/MAHC co-meeting.

ben3000 commented 9 years ago

Typos are kept that add value to the specimen's history, but not others. Errors with the lats/longs are not considered to add value.

nielsklazenga commented 9 years ago

As I said in the meeting, this is confounding two different things. This is not about preserving or destroying history, but about splitting notes into different fields and thereby losing context and detail.and thus destroying data.

This may be a revelation to some, but was already brought up in issues #9, #23, #22, #17 and others.