The challenge here is the diversity of inputs to parse_tax_data and the different ways ranks can be encoded:
A character vector of classifications (e.g. Animalia;Chordata;Mammalia;Primates;Hominidae) might have the rank information embedded (e.g. k_Animalia;p_Chordata;c_Mammalia;o_Primates;c_Hominidae), so a new class_key option like "taxon_rank" could be added.
A data.frame could have a column like the character vector above, or could have one column per rank, which would mean the ranks are the column names. So either a new class_key option or a new option that indicates ranks should be taken from column names
A list of data.frames, one per classification, would store its rank info in a column, so the name of the column with rank information would need to be passed to a new option.
A list of characters, one per classification, could store its rank info as the names of the vectors.
This means there are 4 ways to encode rank and handling all would require at least 2 new options and a new class_key value. The class_key value is intuitive and would not clutter the help page any, but I hesitate to add 2-3 options to handle rank.
Currently, I am thinking of doing the following:
Add a value to class_key called "taxon_rank" for when the rank and taxon name are together in a vector/column (quite common).
Add a TRUE/FALSE option called named_by_rank so users can say when column/vector names are ranks. ( somewhat common)
This will not handle lists of data.frames, but that is the least common input type. Maybe another option called "rank_col" could be added in the future.
Currently, the
Taxon
objects created for thetaxmap
output ofparse_tax_data
do not add rank information, although it is often available in various forms. This meanstaxon_ranks()
does not work as expected (see https://github.com/grunwaldlab/metacoder/issues/188 and https://github.com/grunwaldlab/metacoder/issues/189).The challenge here is the diversity of inputs to
parse_tax_data
and the different ways ranks can be encoded:class_key
option like "taxon_rank" could be added.class_key
option or a new option that indicates ranks should be taken from column namesThis means there are 4 ways to encode rank and handling all would require at least 2 new options and a new
class_key
value. Theclass_key
value is intuitive and would not clutter the help page any, but I hesitate to add 2-3 options to handle rank.Currently, I am thinking of doing the following:
class_key
called"taxon_rank"
for when the rank and taxon name are together in a vector/column (quite common).named_by_rank
so users can say when column/vector names are ranks. ( somewhat common)This will not handle lists of data.frames, but that is the least common input type. Maybe another option called "rank_col" could be added in the future.