Closed peterdesmet closed 6 years ago
I also created a GitHub directory for this dataset (shortname to be confirmed) with specifications in whip: https://github.com/inbo/data-publication/tree/master/datasets/dbwp-occurrences
Something went wrong with the conversion of eventDates. This implies that the columns year, month and day were correct. In the new version of the dataset eventDates were corrected.
EventDate is now YYYY-MM-DD
For documentation:
sourceSamplingEventID:date
: e.g. BI_T5/41_T5:20140710
YYYY-MM-DD
INBO:DBWP:LOC:7
INBO:DBWP:OCC:51
To do:
locality
has Vácrátót
. Was correct before.54.66666
instead of 54.66000
year
, month
, day
specificEpithet
empty for Onthophagus sp.
, Chilothorax sp.
, and Aphodius sp.
genus
+ specificEpithet
: Agrilinus rufa
and Bodilopsis rufa
. I assume first one is not correct?And once published we should not forget to add DOI as datasetID
(with https://
)
Agrilinus rufus is an old synonym for Bodilopsis rufa, so this has been corrected to Bodilopsis rufa
Moved dataset from http://data.inbo.be/ipt/resource?r=dung-beetles-of-the-western-palaearctic-events to http://data.inbo.be/ipt/resource?r=dbwp-events.
Process:
Meanwhile corrected:
id
from data files (was added in the dwca)
Here's my review of the DwC mapping of the Dung beetles dataset:
DBWP
seems to be used, so I would opt fordbwp-events
instead ofdwca-dung-beetles-of-the-western-palaearctic
Event Core
eventID
are unique, but they have a non-ISO date and don't fit the pattern of locationID and occurrenceID:B_KH_C1_Mon_26/09/2015
. I would maybe opt for:INBO:DBWP:EVENT:number
, assuming that number can be included in the source dataset. Note that this field needs to be remapped in Event Core + Occurrence ExtensioneventDate
: are these supposed to include times? E.g.2015-09-25T22:00:00Z
. I only encountered 22h and 23h. Format is correct ISO.year
,month
,day
does not correspond with eventDate. I would remove those three fields:2015-09-06 22:00:00 -> 2014 6 26
2014-12-07 23:00:00 -> 2014 8 12
locationID
: I would opt for the patternINBO:DBWP:LOC:number
instead ofDBWP:LOCIDnumber
countryCode
: required field by GBIF, would be good to addverbatimElevation
: these elevations are unambiguous and do not provide more info thanmin/maxElevationInMeters
. Would remove the fieldcoordinateUncertaintyInMeters
: is it possible to make an educated guess?decimalLatitude/Longitude
: best to round to 5 decimals + have trailing zeros, e.g.51.36000
Occurrence Extension
A lot of fields are repeated from the Event Core, which is harder to maintain and can cause discrepancies. I would remove the following fields:
Also:
occurrenceRemarks
: the bait is really samplingProtocol information, so I would join that info with samplingProtocol in the Event Core, to create:occurrenceID
: currently has the formatINBO:DBWP:ID1
. I would opt forINBO:DBWP:OCC:number
genus
andspecificEpithet
, but doesn't hurt either, as some taxa might be new?Note: all the IDs I propose start with
INBO:
. If we should consider the dataset aUGent
dataset (more correct?), than we should update those IDs asUGent:
and theinstitutionCode
should beUGent
as well.I did not review the metadata. Looks like a very cool dataset! 👍