This table gives an overview of the current GBIF name matching status:
match
fishes
harmonia
macroinvertebrates
plants
rinse
rinse-annex-b
wrims
sum
exact match with gbifapi_scientificName
1
1294
36
1
1332
exact match with gbifapi_canonicalName
22
130
66
3
6093
21
175
6510
EXACT 100%
4
849
211
6
1070
EXACT < 100%
72
20
4
96
FUZZY
1
2
15
73
4
95
HIGHERRANK
3
4
177
224
7
25
440
NO OR DOUBLE MATCH
3
4
2
9
sum
23
140
73
2410
6661
45
200
9552
Some observations:
The first 3 categories can be considered OK, which is 93,3% of the dataset! The only caveat is that we have to trust the accepted names GBIF gives for synonyms (745 records + 15 doubtful), which we don't always do: e.g. Tripolium pannonicum is not a synonym of A. salignus
The 96 EXACT < 100% matches will have to be examined case by case.
The 95 FUZZY matches are probably typos, and are addressed in #41
The 440 HIGHERRANK matches are mostly plants and a lot of hybrids. Chances are we can only correct half of those to match in GBIF.
And then there are 9 records with no match: those are some viruses and names in Harmonia that appear twice in GBIF (a bug that will be fixed in April).
@timadriaens, how do you want to prioritize going forward?
This table gives an overview of the current GBIF name matching status:
Some observations:
EXACT < 100%
matches will have to be examined case by case.FUZZY
matches are probably typos, and are addressed in #41HIGHERRANK
matches are mostly plants and a lot of hybrids. Chances are we can only correct half of those to match in GBIF.@timadriaens, how do you want to prioritize going forward?