trias-project / alien-species-checklist

🐞 Proof of concept for a checklist of alien species in Belgium
MIT License
3 stars 2 forks source link

Proof of concept for a checklist of alien species in Belgium

Rationale

This repository contains an archived 2016 proof of concept for creating a checklist of alien species in Belgium from different sources. Many of the concepts tried here are used for the Global Register of Introduced and Invasive Species - Belgium, an open, reproducible checklist of alien species in Belgium created as part of the TrIAS project. See https://github.com/trias-project/unified-checklist for more information.

Process

  1. Choose and download source datasets

  2. Format the data to tab-delimited values with Open Refine

  3. Define common terms for all source datasets

  4. Map the source datasets to the common terms schema, using this mapping file.

  5. Concatenate all source datasets using:

    csvcat --skip-headers data/interim/fishes/data-with-common-terms.tsv data/interim/harmonia/data-with-common-terms.tsv data/interim/macroinvertebrates/data-with-common-terms.tsv data/interim/plants/data-with-common-terms.tsv data/interim/rinse/data-with-common-terms.tsv data/interim/rinse-annex-b/data-with-common-terms.tsv data/interim/t0/data-with-common-terms.tsv data/interim/wrims/data-with-common-terms.tsv > data/interim/concatenated-checklist.tsv

    Up to this point, all steps are repeatable. The rest is not.

  6. Copy concatenated file to data/interim/verified-checklist.tsv.

  7. Add a number of columns.

  8. Define controlled vocabularies for the terms we're interested in.

  9. Map the current values to controlled vocabularies, using the -mapping-files in vocabularies directory.

  10. Match scientific names to the GBIF backbone taxonomy (assuming inbo-pyutils is locally available):

    python ../inbo-pyutils/gbif/gbif_name_match/gbif_species_name_match.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --update --namecol scientificName --kingdomcol kingdom --strict --api_terms usageKey scientificName canonicalName status rank matchType
  11. Automatically update nameMatchValidation column for synonyms that have been verified:

    python ../inbo-pyutils/gbif/verify_synonyms/verify_synonyms.py data/interim/verified-checklist.tsv data/interim/verified-checklist.tsv --synonym_file data/vocabularies/verified-synonyms.tsv --usagekeycol gbifapi_usageKey --acceptedkeycol gbifapi_acceptedKey --taxonomicstatuscol gbifapi_status --outputcol nameMatchValidation
  12. Review any remaining issues (see this procedure for updating names).

  13. Aggregate the checklist with this notebook to create this the final checklist.

Contributors

List of contributors

License

MIT License