ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Export and document data #29

Closed maurolepore closed 6 years ago

maurolepore commented 6 years ago

This issue describes the process by which I export and document data. You can track this process searching for commits tagged with the number of this issue (#29).

maurolepore commented 6 years ago

@gonzalezeb,

The code that exports data contains this chunk.

# eliminate rows where fam or sp is unknown #use unique(allo_main$species)
master <- subset(master, family != "Unkown")
# chnage name of "equation" column to "equation_form"

1. Do you still want to filter master to exclude rows where sp == "!Unknown"?

Your comment suggest so, but the code doesn't do that. Please update to remove ambiguity. If you express your goal in code you need not to repeate your goal in a comment.

This example may help you write the code you want:

dfm <- data.frame(a = 1:3, b = 3:1)
subset(dfm, a >= 2 & b <=2)
#>   a b
#> 2 2 2
#> 3 3 1

2. Do you still want to rename the column equation as equation_form?

Althought the comment suggests this, the code doesn't do it. Please solve this ambiguity.

This already donne by SHA 0e985e09

maurolepore commented 6 years ago

@gonzalezeb,

For safety I've changed the script that exports data to subset the master dataset not by column position but by column name. Please review and ensure that the selected columns are the ones you intended.

Subsetting objects by position is too risky to do it in a sript; it's OK for interactive use but not for programming because the order of the columns can change too easily and the script will be blind to such change. For programming it is important to subset data by column name. If the names change the scrpit will flag the change with an error. This is desirable because it makes the program safer.

Consider the following example:

dfm <- data.frame(x = 1, y = 2, z = 3)

# Say that you have something like this
selected_names <- names(dfm)[c(1, 3)]

# You can get a vector of names with `datapasta::dpasta()`
datapasta::dpasta(selected_names)

# This is the result
c("x", "z")

# Now you can use this vector to subset `dfm`
dfm_cols <- c("x", "z")
dfm[dfm_cols]
maurolepore commented 6 years ago

@gonzalezeb,

Is the name of the file "allotemp_main.csv" expresive enough? I feel that "temp" is a sufix that may be missleading -- suggesting the data is not important but temporal. I like "_main" although I think you should pick only one of either "master" or "main". I suggest the name "allodb_master_data".

maurolepore commented 6 years ago

@gonzalezeb

Please review data-raw/data.Rmd. Some of the notes use informal abreviations (e.g. sp). Could you please rewrite, now considering that the document is not for personal use but to be shared with users? I think it is OK to abreviate words that mean a concept, for example, it is OK to use sp. to abreviate the concept species. But to refer not to the concept "species" but to an R object called species I suggest you spell the word out and type species (and best is to use code syntax as I do here).