ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

set_attributes not working when measurementScale and domain specified #278

Closed aammd closed 4 years ago

aammd commented 5 years ago

Hello! Either there is a bug in set_attributes, or i am failing to understand the documentation correctly. The code below produces a particularly inscrutable error message. Is this the correct way to specify attributes? Is it the presence of NA values in the attributes column that causes the problem?

library(EML)
library(tibble)
#> Warning: package 'tibble' was built under R version 3.5.2

custom_units <- 
  rbind(
    data.frame(id = "micromolePerMeterSquaredPerSecond", 
               unitType = "luminosity", 
               parentSI = "numberPerSquareMeterPerSecond",
               multiplierToSI = 1, 
               description = "number of micromoles of photos per square meter"),
    data.frame(id = "wattPerMeterSquared", 
               unitType = "luminosity", 
               parentSI = "numberPerSquareMeter",
               multiplierToSI = 1, 
               description = "watts per square meter")
  )

unitList <- set_unitList(custom_units)

attributes <- tribble(
  ~ attributeName,                    ~ attributeDefinition,                    ~measurementScale,    ~domain,                    ~formatString,      ~ unit,                                    ~definition,
  "station",                            "weather station ID",                   "nominal",                "textDomain"     ,      NA,                        NA,                                      "the name of the station",      
  "year",                               "Date of sampling",                     "interval",             "numericDomain"     ,      NA,                         "nominalYear",                          NA,  
  "timestamp",                          "time on a 24 hour clock",              "interval",                "numericDomain"     ,      "YYYY-MM-DDTHH:MM:SS",      "nominalYear",                       NA,  
  "wind_speed",                         "the speed the wind",                   "ratio",                "numericDomain"     ,       NA,                      "metersPerSecond",                        NA,  
  "wind_direction",                     "compass degree of the wind",           "interval",             "numericDomain"     ,      NA,                           "degree",                             NA,     
  "temperature",                        "temperature of the air",               "interval",             "numericDomain"     ,     NA,                           "celsius",                             NA,
  "relative_humidity",                  "Relative humidity in percent",         "ratio",                "numericDomain"     ,      NA,                          "dimensionless",                       NA,  
  "pressure",                           "Air pressure",                         "ratio",                "numericDomain"     ,      NA,                          "kilopascal",                          NA,  
  "UVB",                                "ultraviolet B radiation",              "ratio",                "numericDomain"     ,      NA,                          "wattPerMeterSquared",                 NA,                
  "rain",                               "Rainfall in mm",                       "ratio",                "numericDomain"     ,      NA,                          "millimeter",                          NA,          
  "soil_temperature",                   "temperature of soil",                  "interval",                "numericDomain"     ,      NA,                          "celsius",                             NA,           
  "sensor_temp",                        "temperture of UV sensor",              "interval",                "numericDomain"     ,      NA,                          "celsius",                             NA,                    
  "photosynthesis_active_radiation",    "Photosynthetically active radiation",  "ratio",                "numericDomain"     ,       NA,                       "micromolePerMeterSquaredPerSecond",     NA                                
)            

# year, day, wind , time of day

# create attributes
attributeList <- set_attributes(attributes = attributes)
#> Warning in rep(no, length.out = length(ans)): 'x' is NULL so the result
#> will be NULL
#> Error in ans[!test & ok] <- rep(no, length.out = length(ans))[!test & : replacement has length zero

Created on 2019-06-14 by the reprex package (v0.2.0).

aammd commented 5 years ago

I've traced the problem to set_attribute, which stops the execution of set_attributes if the unit is a customUnit. Shouldn't it be possible for set_attributes to be aware of custom units? or should set_attribute give a message, not a warning?

jeanetteclark commented 5 years ago

Hi Andrew,

That is a really strange error message! Unfortunately I wasn't able to reproduce your error using the example. I get the expected warnings for the custom units (the execution is not stopped).

Warning messages:
1: In set_attribute(attributes[i, ], factors = factors) :
  unit 'wattPerMeterSquared' is not recognized, using custom unit.
          Please define a custom unit or replace with a
          recognized standard unit (see set_unitList() for details)
2: In set_attribute(attributes[i, ], factors = factors) :
  unit 'micromolePerMeterSquaredPerSecond' is not recognized, using custom unit.
          Please define a custom unit or replace with a
          recognized standard unit (see set_unitList() for details)

Here is my session info if you want to compare:

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tibble_2.1.3  EML_2.0.0     raster_2.8-19 sp_1.3-1     

loaded via a namespace (and not attached):
 [1] compiler_3.6.0   tools_3.6.0      pillar_1.4.1     rstudioapi_0.10  crayon_1.3.4     Rcpp_1.0.1       uuid_0.1-2       xml2_1.2.0       codetools_0.2-16 grid_3.6.0       pkgconfig_2.0.2  rlang_0.3.4     
[13] lattice_0.20-38 
clnsmth commented 5 years ago

Hi @jeanetteclark,

I was able to reproduce @aammd's error on my Windows OS:

Error in ans[!test & ok] <- rep(no, length.out = length(ans))[!test &  : 
  replacement has length zero
In addition: Warning message:
In rep(no, length.out = length(ans)) :
  'x' is NULL so the result will be NULL

Session info:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] tibble_2.1.3 EML_2.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       rstudioapi_0.10  knitr_1.23      
 [4] xml2_1.2.0       magrittr_1.5     uuid_0.1-2      
 [7] tidyselect_0.2.5 R6_2.4.0         rlang_0.3.4     
[10] dplyr_0.8.1      tools_3.5.2      xfun_0.7        
[13] jqr_1.1.0        htmltools_0.3.6  digest_0.6.19   
[16] yaml_2.2.0       lazyeval_0.2.2   assertthat_0.2.1
[19] jsonld_2.1       crayon_1.3.4     purrr_0.3.2     
[22] curl_3.3         emld_0.2.0       glue_1.3.1      
[25] evaluate_0.14    rmarkdown_1.13   V8_2.2          
[28] compiler_3.5.2   pillar_1.4.1     jsonlite_1.6    
[31] pkgconfig_2.0.2 

traceback()

6: ifelse(is.na(row[["unit"]]), "For unitless values, use \"dimensionless\" as the unit. ", 
       NULL)
5: warning("Unit '", row[["unit"]], "' is not a recognized standard unit; treating as custom unit. ", 
       "Please be sure you also define a custom unit in your EML record, ", 
       "or replace with a recognized standard unit. ", ifelse(is.na(row[["unit"]]), 
           "For unitless values, use \"dimensionless\" as the unit. ", 
           NULL), "See set_unitList() for details.")
4: set_attribute(attributes[i, ], factors = factors, missingValues = missingValues)
3: FUN(X[[i]], ...)
2: lapply(1:dim(attributes)[1], function(i) set_attribute(attributes[i, 
       ], factors = factors, missingValues = missingValues))
1: set_attributes(attributes = attributes)

I then tried a few new sessions to no avail followed by a re-installation of EML and tibble. Refreshed my session one last time and the error stopped. Doesn't make sense that the re-install would have helped since the package versions between error and non-error sessions were the same.

Warning messages:
1: In set_attribute(attributes[i, ], factors = factors, missingValues = missingValues) :
  Unit 'wattPerMeterSquared' is not a recognized standard unit; treating as custom unit. Please be sure you also define a custom unit in your EML record, or replace with a recognized standard unit. See set_unitList() for details.
2: In set_attribute(attributes[i, ], factors = factors, missingValues = missingValues) :
  Unit 'micromolePerMeterSquaredPerSecond' is not a recognized standard unit; treating as custom unit. Please be sure you also define a custom unit in your EML record, or replace with a recognized standard unit. See set_unitList() for details.

Session info:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] tibble_2.1.3 EML_2.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       rstudioapi_0.10  knitr_1.23      
 [4] xml2_1.2.0       magrittr_1.5     uuid_0.1-2      
 [7] tidyselect_0.2.5 R6_2.4.0         rlang_0.3.4     
[10] dplyr_0.8.1      tools_3.5.2      xfun_0.7        
[13] jqr_1.1.0        htmltools_0.3.6  digest_0.6.19   
[16] yaml_2.2.0       lazyeval_0.2.2   assertthat_0.2.1
[19] jsonld_2.1       crayon_1.3.4     purrr_0.3.2     
[22] curl_3.3         emld_0.2.0       glue_1.3.1      
[25] evaluate_0.14    rmarkdown_1.13   V8_2.2          
[28] compiler_3.5.2   pillar_1.4.1     jsonlite_1.6    
[31] pkgconfig_2.0.2 
cboettig commented 5 years ago

Thanks @aammd @clnsmth and @jeanetteclark . I was able to reproduce this on my windows machine in R 3.5.2; I updated to 3.6.0 (which also involved updating my R package library), and now I'm no longer able to reproduce this. It may be related to something in windows/base R's ifelse (at least that's where the trace leaves me; not seeing anything obviously wrong) in R 3.5.2 or so but hard to be sure.

In any event, I recommend trying with the current version of R and seeing if that resolves the issue...

lkuiucsb commented 4 years ago

Today, I was able to reproduce the error message in macOS Mojave.

session info: R version 3.6.1 (2019-07-05) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tibble_2.1.3 EML_2.0.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.2 rstudioapi_0.10 knitr_1.24 xml2_1.2.2 magrittr_1.5 uuid_0.1-2 tidyselect_0.2.5 R6_2.4.0 rlang_0.4.0 dplyr_0.8.3.9000 tools_3.6.0 xfun_0.9
[13] jqr_1.1.0 htmltools_0.3.6 digest_0.6.20 yaml_2.2.0 lazyeval_0.2.2 assertthat_0.2.1 jsonld_2.1 crayon_1.3.4 purrr_0.3.2 curl_4.0 emld_0.2.0 glue_1.3.1
[25] evaluate_0.14 rmarkdown_1.15 V8_2.3 compiler_3.6.0 pillar_1.4.2 jsonlite_1.6 pkgconfig_2.0.2

I have the latest version of R and EML package.

However, I use another computer that had Windows OS with the same tibble and EML package, and there was no error at all (had warning message about customized unit). Not sure what is going on here.

cboettig commented 4 years ago

Still can't reproduce the bug on my end (tested on MacOS Mojave with R 3.6.1 and EML 2.0.0). @lkuiucsb Do you see the warning that Andrew gets about packages being out of date? #> Warning: package 'tibble' was built under R version 3.5.2

lkuiucsb commented 4 years ago

no warning message about outdated package, neither Windows nor Mac.

lkuiucsb commented 4 years ago

This is the traceback info for the error on Mac 6: ifelse(is.na(row[["unit"]]), "For unitless values, use \"dimensionless\" as the unit. ", NULL) 5: warning("Unit '", row[["unit"]], "' is not a recognized standard unit; treating as custom unit. ", "Please be sure you also define a custom unit in your EML record, ", "or replace with a recognized standard unit. ", ifelse(is.na(row[["unit"]]), "For unitless values, use \"dimensionless\" as the unit. ", NULL), "See set_unitList() for details.") 4: set_attribute(attributes[i, ], factors = factors, missingValues = missingValues) 3: FUN(X[[i]], ...) 2: lapply(1:dim(attributes)[1], function(i) set_attribute(attributes[i, ], factors = factors, missingValues = missingValues)) 1: set_attributes(attributes = attributes)

jeanetteclark commented 4 years ago

for some reason, I am now able to reproduce this with the following session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tibble_2.1.3 EML_2.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       rstudioapi_0.10  knitr_1.24       xml2_1.2.2       magrittr_1.5     uuid_0.1-2       tidyselect_0.2.5
 [8] R6_2.4.0         rlang_0.4.0      dplyr_0.8.3      tools_3.6.1      xfun_0.8         jqr_1.1.0        htmltools_0.3.6 
[15] digest_0.6.20    yaml_2.2.0       lazyeval_0.2.2   assertthat_0.2.1 jsonld_2.1       crayon_1.3.4     purrr_0.3.2     
[22] curl_4.0         emld_0.2.0       glue_1.3.1       evaluate_0.14    rmarkdown_1.15   V8_2.3           compiler_3.6.1  
[29] pillar_1.4.2     jsonlite_1.6     pkgconfig_2.0.2 

I also get the traceback to the ifelse function. I tested on the same system by replacing that ifelse with a simple if statement and the error went away. I still don't know why this happens, but I'm going to submit the PR with the code change.

cboettig commented 4 years ago

I believe this is now closed by #283