Closed ehwenk closed 1 year ago
see also issue #120
@falster, I don't think we're missing an option. splits are one-to-many
merges, while alternate taxonomic status values are many-to-one
joins. There isn't anything else to return. See latest comment in issue #120 and also:
# with `alternate taxonomic status`, a single accepted_name_usage matches to multiple canonical names
# this is a one-x-to-one-y join, because each canonical name (i.e. aligned name) has only a single accepted name
# so - I might be wrong - I don't think this should result in any propagation of rows
# so - there isn't a "return_all" option that is separate from "return_splits"
# please poke holes in this argument
# "Selaginella australiensis" is a good example with 9 synonyms
# For this, collapsing `alternate_taxonomic_status_aligned` must also be performed on the `aligned_name`, not the `accepted_name`
# For instance, if the `aligned_name` is one of the taxonomic synonyms, the `taxonomic_status_aligned` is that synonym's `taxonomic_status`, while `taxonomic_status` is accepted, with no alternatives
# It is only if the `aligned_name` is already the `accepted_name` that it is appropriate to report alternate taxonomic status values (I think, happy to be hold I'm wrong)
# So I think, really, this is almost a mutate on `resources$APC` before/as it is being joined during update_taxonomy
collapsed_taxonomic_status <-
resources$APC %>%
dplyr::select(canonical_name, accepted_name_usage, accepted_name_usage_ID, taxon_ID, taxonomic_status) %>%
dplyr::group_by(accepted_name_usage_ID) %>%
dplyr::arrange(taxonomic_status) %>% ## XX replace with proper function with `my_order`
dplyr::mutate(alternative_taxonomic_status_aligned =
taxonomic_status %>%
unique() %>%
subset(., . != "accepted") %>%
paste0(collapse = " | ") %>%
dplyr::na_if("")
) %>%
dplyr::slice(1) %>%
dplyr::ungroup()
data %>%
dplyr::left_join(
by = "aligned_name",
collapsed_taxonomic_status %>%
rename(aligned_name = canonical_name) %>%
select(
aligned_name,
alternative_taxonomic_status_aligned
)
)
@dfalster @wcornwell Can you run the code at the bottom of the comment and think about the following questions:
What are we actually trying to document with the field alternative_taxonomic_status_aligned
that is different to what we're documenting with splits/most likely species/collapses? I'm going in circles, seeing them are distinct vs near-identical concepts. The only place they are different would be alternative_taxonomic_status_aligned
includes misapplied
& excluded
With Selaginella australiensis
if the aligned_name
is Selaginella australiensis
there is no ambiguity in the taxonomic_status
of the aligned_name
; it is accepted
. Same with all the synonyms - Selaginella leptostachya
simply is a taxonomic_synonym
of Selaginella australiensis
, which is accepted.
[ ] Is there a reason that the row for Selaginella australiensis
should document the taxonomic status of all the synonyms (& like) of names for which Selaginella australiensis
is the accepted name??
With Acacia aneura
if the aligned_nameis
Acacia aneura, the ambiguity in whether this is truly
Acacia aneuraor instead
Acacia paraneuraor
Acacia minyurais documented with the columns about alternative accepted names (i.e. splits). And these also document the taxonomic status of the alternative names. It is true there is also
Acacia anuerathat has been
misappliedto
Acacia quadrimarginea. Maybe this is part of the
alternative_accepted_namescolumn for the
most_likely_speciesoption, but
Acacia quadrimargineais excluded from the list of
return_all`.
With Acacia minyura, if the aligned_name
is Acacia minyura
, there is no ambiguity in the taxonomic_status
of the aligned_name
; it is accepted
. It isn't pro parte misapplied
, so the row indicating that a plant identified as Acacia aneura
might actually be Acacia minyura
should cause Acacia minyura
to have pro parte misapplied
added as an alternative_taxonomic_status_aligned
. As for its synonyms, that is the same as the Selaginella
example above. So no alternative_taxonomic_status_aligned
values should be added to Acacia minyura
.
[ ] What am I missing??
resources$APC %>%
dplyr::mutate(
accepted_name = resources$`APC list (accepted)`$canonical_name[match(accepted_name_usage_ID, resources$`APC list (accepted)`$accepted_name_usage_ID)]
) %>%
dplyr::filter(species_and_infraspecific(taxon_rank)) %>%
dplyr::filter(taxonomic_status != "excluded") %>%
dplyr::select(canonical_name, accepted_name, accepted_name_usage_ID, taxon_ID, taxonomic_status, taxon_rank) %>%
dplyr::filter(canonical_name %in% c("Acacia aneura", "Acacia minyura", "Acacia paraneura") |
accepted_name_usage_ID %in% c("https://id.biodiversity.org.au/node/apni/6707550","https://id.biodiversity.org.au/node/apni/2915027","https://id.biodiversity.org.au/node/apni/2914546")) %>% View()
*Is there a reason that the row for Selaginella australiensis should document the taxonomic status of all the synonyms (& like) of names for which Selaginella australiensis is the accepted name?? **
You would want to do this if you were building a webpage for the species, like ALA and POWO must have done something like that to get this: https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:90399-3#synonyms or https://bie.ala.org.au/species/https://id.biodiversity.org.au/node/apni/2915027#names
But I'd argue it's beyond the scope of 99.99% (possibly 100%) of use cases for APCalign
. I can't really think why you'd want to do for more than one name at once if you're not building a flora or flora-like resource. I sometimes will look up ALA or POWO to see the synonyms of a single name, but I can't imagine why I'd need to do that for 10 or 100 or 1000 names.
So I'd argue that is beyond (current) scope for this project.
But I'd argue it's beyond the scope of 99.99% (possibly 100%) of use cases for
APCalign
. I can't really think why you'd want to do for more than one name at once if you're not building a flora or flora-like resource. I sometimes will look up ALA or POWO to see the synonyms of a single name, but I can't imagine why I'd need to do that for 10 or 100 or 1000 names.
And we wouldn't actually be reporting the synonyms, just that there are synonyms effectively. So I'm leaving it out at this point. It isn't about aligning a name at all.
Closed by commit ac799c3
The parameter
taxonomic_splits
requires an additional option,keep_taxonomic_splits
.This option would only maintain duplicate rows for a canonical name where there is truly ambiguity in which current canonical name is being referenced by a given aligned name. This would contrast with
return_all
which would return all rows where a synonym (or other taxonomic status) exists.For instance, the once-upon-a-time taxon concept
Acacia aneura
has been split into 3 taxa,Acacia aneura
,Acacia minyura
andAcacia paraneura
. However, there is one additional entry under the canonical nameAcacia quadrimarginea
, which is a misapplied use. There are also many synonyms where there is no ambiguity.These should be separate outputs - we need to think about how to structure these and probably change how
update_taxonomy
functions, based on the desired output