This repository contains an archived 2016 proof of concept for creating a checklist of alien species in Belgium from different sources. Many of the concepts tried here are used for the Global Register of Introduced and Invasive Species - Belgium, an open, reproducible checklist of alien species in Belgium created as part of the TrIAS project. See https://github.com/trias-project/unified-checklist for more information.
Choose and download source datasets
Format the data to tab-delimited values with Open Refine
Define common terms for all source datasets
Map the source datasets to the common terms schema, using this mapping file.
Concatenate all source datasets using:
csvcat --skip-headers data/interim/fishes/data-with-common-terms.tsv data/interim/harmonia/data-with-common-terms.tsv data/interim/macroinvertebrates/data-with-common-terms.tsv data/interim/plants/data-with-common-terms.tsv data/interim/rinse/data-with-common-terms.tsv data/interim/rinse-annex-b/data-with-common-terms.tsv data/interim/t0/data-with-common-terms.tsv data/interim/wrims/data-with-common-terms.tsv > data/interim/concatenated-checklist.tsv
Up to this point, all steps are repeatable. The rest is not.
Copy concatenated file to data/interim/verified-checklist.tsv.
Add a number of columns.
Define controlled vocabularies for the terms we're interested in.
Map the current values to controlled vocabularies, using the -mapping
-files in vocabularies directory.
Match scientific names to the GBIF backbone taxonomy (assuming inbo-pyutils is locally available):
python ../inbo-pyutils/gbif/gbif_name_match/gbif_species_name_match.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --update --namecol scientificName --kingdomcol kingdom --strict --api_terms usageKey scientificName canonicalName status rank matchType
Automatically update nameMatchValidation
column for synonyms that have been verified:
python ../inbo-pyutils/gbif/verify_synonyms/verify_synonyms.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --synonym_file data/vocabularies/verified-synonyms.tsv --usagekeycol gbifapi_usageKey --acceptedkeycol gbifapi_acceptedKey --taxonomicstatuscol gbifapi_status --outputcol nameMatchValidation
Review any remaining issues (see this procedure for updating names).
Aggregate the checklist with this notebook to create this the final checklist.