traitecoevo / austraits.build

Source for AusTraits
Other
16 stars 2 forks source link

Errors reading in `entity_type`, `value_type` from columns #725

Closed ehwenk closed 1 year ago

ehwenk commented 1 year ago
ehwenk commented 1 year ago

I've worked out what why we excluded entity_type from the columns to be read in. The allowable values for entity type specified in schema are individual, population, metapopulation, species, genus, family, and order. There are many studies where there will be columns with these same names and then the values in the column are being used as the entity_type rather than the fixed value. Therefore, we need to specify that entity_type can only be read in from a column that is NOT one of the allowable entity_type values specified in schema.

ehwenk commented 1 year ago

With value_type the instances that weren't reading in correctly were for long datasets where value_type was specified for each trait, rather than a single time for the dataset. When the column was instead specified at the dataset level, it worked properly. It seems like it should work either way, to let values for a species trait to be read in from a column? But I realise this might be hard with long datasets and for now the problem is solved.

ehwenk commented 1 year ago

And while the problems are solved for AusTraits, AusInverTraits is having problems with some of these fields. I'll check if it is because all their data are in long format.

dfalster commented 1 year ago

Ok, this commit should ensure that entity_type can only be read in from a column that is NOT one of the allowable entity_type or value_type values specified in schema.

To test

devtools::load_all()
source("scripts/custom.R")
resource_metadata <- get_schema("config/metadata.yml", "metadata")
definitions <- get_schema("config/traits.yml", "traits")
unit_conversions <- get_unit_conversions("config/unit_conversions.csv")
taxon_list <- read_csv_char("config/taxon_list.csv")
schema <- get_schema()

v <- "Brock_1993"
config <- dataset_configure(file.path("data", v, "metadata.yml"), definitions, unit_conversions)
raw <- dataset_process(file.path("data", v, "data.csv"), config, schema, resource_metadata)

raw$traits

Before

image

After

image

dfalster commented 1 year ago

@ehwenk also pointed out that when in long format, we might want to bring in columns of data for things like entity_type and value_type, and that currently this isn't possible

dfalster commented 1 year ago

moved to https://github.com/traitecoevo/traits.build/issues/6