Closed damianooldoni closed 5 years ago
List of columns (in order) of verification file:
taxonKey
: TscientificName
datasetKey
bb_key
: Bbb_scientificName
bb_kingdom
bb_rank
bb_taxonomicStatus
bb_acceptedKey
: Abb_acceptedName
bb_acceptedKingdom
: expected to be same as bb_kingdom
bb_acceptedRank
bb_acceptedTaxonomicStatus
: expected to always be ACCEPTED
verificationKey
remarks
dateAdded
Note:
interim/verification_file.tsv
While working on this, I found that it would be very practical to have a kind of boolean column to indicate whether the synonym relation is outdated or not. Up to now, we agreed to just add Outdated synonym.
to column remarks.
Some advantages of adding such column:
verify_taxa()
is more readable.verify_taxa()
decreases.verify_taxa()
works faster.Suggested column name: outdated
. @peterdesmet : what do you think about?
Agreed, as last column.
I would say second to last column. I would leave remarks
as last one. @peterdesmet : Do you agree?
I see that dateAdded
is the last one in your description. Ok, then I put outdated as last one! :+1:
Yeah, the remarks fields is moved forward so that editors can easily add things there.
Based on triplet T, B and A (c("taxonKey", "bb_key", "bb_acceptedKey")
) the outdated synonyms are detected while detecting unused taxa. That's nice actually.
I would call such extra column used
instead of outdated
. Accepted values:
TRUE
(in use) , FALSE
(not in use).I understand your reasoning, but the active step is marking something as outdated
(TRUE) on a rerunning of the script. The other values (FALSE) are just default values: saying it is used
can be misleading (e.g. if there is no verification key it will not be "used").
I understand your reasoning and I agree with it.
Up to now, one of the informative output dataframes of verify_taxa() is unused_taxa
: it includes all taxa in verification_file.csv
which are not in the input taxa, independent of verification key.
Following your reasoning I would recall this df outdated_taxa
.
@damianooldoni I think this can be closed?
File
verification_file.csv
will be used by taxonomists for verifying taxa. At the moment, it contains the following columns in the following order:scientificName
bb_scientificName
bb_taxonomicStatus
bb_acceptedName
bb_key
bb_acceptedKey
bb_kingdom
issues
verification_key
date_added
checklists
remarks
This structure is optimal for experts but too difficult to manage. Main source of bugs is the combination of the following properties:
scientififcName
allowedHere below the columns of
taxa.csv
. I checked the box aside the columns I think we need to include inverification_file.csv
:taxonKey
scientificName
taxonID
datasetKey
nameType
issues
validDistribution
bb_key
bb_scientificName
bb_species
bb_genus
bb_family
bb_order
bb_class
bb_phylum
bb_kingdom
bb_rank
bb_speciesKey
bb_taxonomicStatus
bb_acceptedKey
bb_acceptedName
In addition to these columns, the next ones are peculiar of
verification_file.csv
and should be present:verificationKey
dateAdded
remarks
Based on what we decide in this issue I will modify (= simplify)
trias::verify_taxa()
. @peterdesmet What do you think?