riparias / rato-occurrences

DwC mapping of RATO vwz occurrences
MIT License
0 stars 1 forks source link

Scientific names are built upon GBIF codes, but they don't always match the `Soort` column #207

Open PietrH opened 3 weeks ago

PietrH commented 3 weeks ago

Currently, the species in the darwincore occurrence output is determined based on the GBIF_Code column in the raw data, not on the Soort column.

However, if we look up the values for the Soort column for Rattus norvegicus, we also get rabbits and chickens, which makes me think that perhaps this field isn't 100% reliable.

filter(raw_data, GBIF_Code == 2439261) %>% count(Soort, sort = TRUE)
Soort n
Bruine rat bak/buis 87074
Kippen 8
Andere (soort vermelden): 3
Konijnen 1
Steenmarter 1

However, the Soort column isn't necessarily more reliable either, especially the "Other" field.

@damianooldoni I remember you mentioning you had trouble with this in the past, do you maybe have a bit more context? What do you think we should do?

damianooldoni commented 2 weeks ago

As mentioned in #24 I would map them manually. In other words, I would not use the GBIF codes they provide. They started to add it by a request of @timadriaens some years ago. The reason was that their data were at that time not on GBIF yet and by having a backbone GBIF taxonKey was practical to use RATO data and join them with other data sources.

timadriaens commented 2 weeks ago

indeed, manual seems the way to go, they do not have the technical and species knowledge to always be aware of the correct codes.