Closed GregGuerin closed 4 years ago
I will try and do some prelim work on adding user option to use herbarium_determination or new standardised name field(s), but we will have to wait to see the new fields to strip out non-vascular plants and incomplete records.
I have created some new code to select the herbarium_determination or new standardised name field. But will we want to think some more about how to approach incomplete names. Beyond the obvious ones, such as "no_id", how do we want to treat ids that are only down to the genus level? Excluding an accurate genus ID would needlessly lower diversity estimates, but presumably the new standardized name field will only include genus_species if available? I think the standardised name field may need to be re-thought of as the "lowest possible identification", then we can have other columns that only give the genus_species if available?
The standardised name data should be able to return Genus sp. records - need to see an example of the output though! They should be included by default but with an option/argument for stripping them out for 'clean' species lists/matrices (because for example they could be treated as the same taxon in turnover analysis). Again, hard to know the best way to filter on that without seeing the fields - could be a flag for ID level, 'sp.' as epithet etc - remembering that 'phrase names' also have sp. in the name.
I have some code that can "strip" the species name down to genus_species, I wrote that for my own work last year
I could create something which then deletes all the "hits" from veg.PI that do not provide a genus_species, but I am not sure that would easier than just having another column that flags the ID level. Much cleaner code if we add a flag for ID level.
With new taxonomy fields from the database, add user option to use herbarium_determination or new standardised name field(s) - making it easy to get a table of standardised names or even genus or family, species level or lowest available level.
changes must apply to each m_kind %in% PA, percent_cover, freq, IVI.
the current output is just the species x sites matrix, so it won't break existing up or down stream code if the updated function defaults to standardised taxon at lowest level, for example
in addition to choice of herbarium versus standardised determinations and level, there should be an option that defaults to stripping out non-vascular plant species, and the option to exclude records not fully identified etc.