Code and data for plant name synonym expansion and fuzzy name matching. Most recent use is by the "mycorrhizal soil/climate" analyses, see the plant_gbif repository
We use the classification.txt
data file downloaded from World Flora Online. /data/WFO_2_synonym_list.R
produces a reformatted file for /scripts/synonymize.py
/data/name-lists/
: lists of names from other sources to match/scrub against. Provided as examples.
The /scripts/synonymize.py
utility creates a synonym table from World Flora ONline. The idea is to hand the script a list of canonical names (such as the species names associated with your trait data), and obtain a list of names that includes those and all synonyms. For example:
python synonymize.py -b -a expand canonical_names.txt > expanded_names.txt
The command above uses the -b
option to indicate we want to only use binomials and ignore three-part names, the -a
option gives the action to perform (expand).
The merge action allows merging to a canonical list of names (not necessarily World Flora Online "accepted" names, although that is the default). The result will be of the same length as the input expanded names list but every name will be replaced with the corresponding canonical name. By lining up the expanded list with the merged result one can create a lookup table that allows converting from any synonym to a canonical anme. You will always want to merge back to your original canonical names list:
python synonymize.py -b -a merge -c canonical_names.txt expanded_names.txt > ../results/merged-names.txt
See the docstring and usage for more information. Try:
python synonymize.py -h
This could be speeded up by cacheing the lookup dictionaries. As it works now, the entire lookup data is re-read each time the program is run. But it works.
The expand_names.sh
provides an example usage expanding the canonical names from the Tank tree. See Zanne et al 2013. This is just an example.
fuzzy_match.py
provides the fuzzy_match_name_list()
function.gbif_lookup.py
is an example script that demonstrates how to use the fuzzy_match_name_list
function. The script creates a lookup table from the expanded Tank et al. tree names created by expand_tanknames.sh
to the names found in the gbif database according to /data/names-lists/gbif-occurrences-names.txt
.The code in the plant_gbif repository provides more complete examples of how to use taxon-name-utils for large name matching tasks.