make input names consistent across main functions

dfalster commented 1 year ago

The first argument for align_taxa is called original_name, while for create_taxonomic_update_lookup it's taxa.

Argument name need to be consistent across the main functions: align_taxa is called , create_taxonomic_update_lookup and update_taxonomy, also with outputs

For this argument I suggest original_name is better.

ehwenk commented 1 year ago

Some variable names to consider renaming / make more consistent:

taxonomic_ref , taxonomic_reference - reference APC, APC accepted, APC known, APNI. This is not a term in DarwinCore, where the references are the primary taxonomic literature references (NamingAuthority). In NSL it is called dataset (too ambiguous). I'm happy with taxonomic_reference, which is what we use in traits.build. From traits.build schema: taxon_rank: The taxonomic rank of the most specific name in the scientific name.
taxonRank, taxon_rank, taxonomic_resolution - this is the rank of a name. it is, in traits.build terms, also the resolution to which a name can be aligned, which is the conflict. They are slightly different, but I think it is fine to go with taxon_rank. This is an official term in DarwinCore and NSL. In traits.build, we use both. taxonomic_resolution within the taxonomic_updates table, defined as The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves.. Then in the taxon table, it is instead, taxon_rank, defined as The taxonomic rank of the most specific name in the scientific name.
binomial, trinomial - These terms, during matching at least, are slight mis-nomers, because they are actually simply "first two words" and "first three words", once all filler words (sp, spp, var, form, etc) are removed by strip_names_2(). There are lots of phrase names (i.e. Species level) that are aligned to the trinomial column. But I'm also happy to keep what we have, because in the traits.build output, the actually output only fills in the columns if the taxon_rank matches; it is just the intermediary matching step where the columns don't perfectly match the expected taxon_rank
ID as described in issue #117 should be split into scientific_name_ID and taxon_ID for a number of reasons. Most importantly, 1) names and taxa are distinct concepts and 2) scientific_name_ID provides the link between APC & APNI scientific names.

ehwenk commented 1 year ago

In terms of input parameters for the functions, I think they are largely consistent - the exception is that the term original_name is used in align_taxa versus taxa in match_taxa and create_taxonomic_update_lookup. I can't decide if I think these should be the same term or not.

traitecoevo / APCalign

make input names consistent across main functions #106