Closed ehwenk closed 1 year ago
I've worked out what why we excluded entity_type
from the columns to be read in. The allowable values for entity type specified in schema are individual
, population
, metapopulation
, species
, genus
, family
, and order
. There are many studies where there will be columns with these same names and then the values in the column are being used as the entity_type
rather than the fixed value. Therefore, we need to specify that entity_type
can only be read in from a column that is NOT one of the allowable entity_type values specified in schema.
With value_type
the instances that weren't reading in correctly were for long datasets where value_type
was specified for each trait, rather than a single time for the dataset. When the column was instead specified at the dataset level, it worked properly. It seems like it should work either way, to let values for a species trait to be read in from a column? But I realise this might be hard with long datasets and for now the problem is solved.
And while the problems are solved for AusTraits, AusInverTraits is having problems with some of these fields. I'll check if it is because all their data are in long format.
Ok, this commit should ensure that entity_type can only be read in from a column that is NOT one of the allowable entity_type or value_type values specified in schema.
To test
devtools::load_all()
source("scripts/custom.R")
resource_metadata <- get_schema("config/metadata.yml", "metadata")
definitions <- get_schema("config/traits.yml", "traits")
unit_conversions <- get_unit_conversions("config/unit_conversions.csv")
taxon_list <- read_csv_char("config/taxon_list.csv")
schema <- get_schema()
v <- "Brock_1993"
config <- dataset_configure(file.path("data", v, "metadata.yml"), definitions, unit_conversions)
raw <- dataset_process(file.path("data", v, "data.csv"), config, schema, resource_metadata)
raw$traits
Before
After
@ehwenk also pointed out that when in long format, we might want to bring in columns of data for things like entity_type and value_type, and that currently this isn't possible
value_type should be able to be read in from a column, but it isn't working at the moment for AusTraits or for AusInverTraits
it seems that we added an exception for
entity_type
in line 1066 of process.R, which doesn't make sense; but when I remove this exception, it still isn't read in from a columnSee also issue #674 - which is about adding "unit_in" to the list of fields that can be read in. There are datasets where different rows of data have different units for a trait, especially for scraped morphology datasetes (AusTraits & AusInverTraits). At the moment custom_R_code is being used to align units, but this is clunky.