ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Determine explicitely the type of each column of the master data #32

Closed maurolepore closed 6 years ago

maurolepore commented 6 years ago

To ensure column types are interpreted as expected, write function to output column types to be passed to col_types (see ?readr::read_csv). For example, I used this approach in fgeo.tool::type_vft (https://forestgeo.github.io/fgeo.tool/reference/type_fgeo.html). The type of each column Erika documented as metatada-tables.

The code below outputs a list that can be passed to col_type of readr::read_csv(). Before I can do this I need help from @gonzalezeb. I asked her to clean the column names of the master data (#31).


# Import and clean --------------------------------------------------------

path_to_data <- here::here("data-raw/allotemp_main.csv")
master <- readr::read_csv(path_to_data, col_types = types_allodb_master)

# This prints to screen the contents of types_allodb_master (see below)
library(tidyverse)

types <- map(master, class) %>%
  enframe() %>%
  unnest() %>%
  mutate(
    type = case_when(
      value == "character" ~ "c",
      value == "integer" ~ "i",
      value == "numeric" ~ "d"
    ),
    type = paste0(name, " = '", type, "',")
  ) %>%
  pull(type)

cat(types)

# Determine column type explicitely to avoid surprises (see readr::read_csv)
# c = character,
# i = integer,
# n = number,
# d = double,
# l = logical,
# D = date,
# T = date time,
# t = time,
# ? = guess,
# _/- to skip the column.
types_allodb_master <- list(
  # xxx
)

Once this is done I need to ask @gonzalezeb to confirm the type of each variable.

maurolepore commented 6 years ago

@gonzalezeb,

Thanks for cleaning the column names. I wrote a little function that will help us to read the master dataset more safely -- with an explicit specification of what type to expect from each column. Whenever you can, Could you please check HERE that the column types I specified are correct? (Feel free to edit in place -- directly form GitHub).

Reference:

gonzalezeb commented 6 years ago

Column types are correct.