ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Error with add_basic_meta #143

Closed sckott closed 8 years ago

sckott commented 8 years ago

was in process of updating all our tutorials on the website, and ran into this problem

data("bird.orders")
birds <- add_trees(bird.orders)
birds <- add_basic_meta(
  title = "Phylogeny of the Orders of Birds From Sibley and Ahlquist",

  description = "This data set describes the phylogenetic relationships of the
     orders of birds as reported by Sibley and Ahlquist (1990). Sibley
     and Ahlquist inferred this phylogeny from an extensive number of
     DNA/DNA hybridization experiments. The ``tapestry'' reported by
     these two authors (more than 1000 species out of the ca. 9000
     extant bird species) generated a lot of debates.

     The present tree is based on the relationships among orders. The
     branch lengths were calculated from the values of Delta T50H as
     found in Sibley and Ahlquist (1990, fig. 353).",

  citation = "Sibley, C. G. and Ahlquist, J. E. (1990) Phylogeny and
     classification of birds: a study in molecular evolution. New
     Haven: Yale University Press.",

  creator = "Sibley, C. G. and Ahlquist, J. E.",
    nexml=birds)

When trying to print the birds object I get

Error: Unknown column 'content'

just me?

Packages ---------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                                
 ape        * 3.4        2015-11-29 CRAN (R 3.2.3)                        
 assertthat   0.1        2013-12-06 CRAN (R 3.2.0)                        
 bold         0.3.5      2016-03-28 local (ropensci/bold)                 
 codetools    0.2-14     2015-07-15 CRAN (R 3.2.5)                        
 data.table   1.9.7      2016-02-29 Github (Rdatatable/data.table@dde418f)
 DBI          0.4        2016-05-02 CRAN (R 3.2.5)                        
 devtools   * 1.11.1     2016-04-21 CRAN (R 3.2.5)                        
 digest       0.6.9      2016-01-08 CRAN (R 3.2.3)                        
 dplyr        0.4.3.9001 2016-04-22 Github (hadley/dplyr@c67fa89)         
 foreach      1.4.3      2015-10-13 CRAN (R 3.2.2)                        
 httr         1.1.0      2016-01-28 CRAN (R 3.2.3)                        
 iterators    1.0.8      2015-10-13 CRAN (R 3.2.2)                        
 jsonlite     0.9.19     2015-11-28 CRAN (R 3.2.2)                        
 lattice      0.20-33    2015-07-14 CRAN (R 3.2.5)                        
 lazyeval     0.1.10     2015-01-02 CRAN (R 3.2.0)                        
 magrittr     1.5        2014-11-22 CRAN (R 3.2.0)                        
 memoise      1.0.0      2016-01-29 CRAN (R 3.2.3)                        
 nlme         3.1-127    2016-04-16 CRAN (R 3.2.5)                        
 plyr         1.8.3      2015-06-12 CRAN (R 3.2.0)                        
 R6           2.1.2      2016-01-26 CRAN (R 3.2.3)                        
 Rcpp         0.12.4     2016-03-26 CRAN (R 3.2.4)                        
 rentrez      1.0.2      2016-04-21 CRAN (R 3.2.5)                        
 reshape      0.8.5      2014-04-23 CRAN (R 3.2.0)                        
 reshape2     1.4.1      2014-12-06 CRAN (R 3.2.0)                        
 rncl         0.6.0      2015-07-20 Github (fmichonneau/rncl@66bf2c8)     
 RNeXML     * 2.0.6      2016-03-07 CRAN (R 3.2.3)                        
 rotl         3.0.0      2016-04-26 CRAN (R 3.2.5)                        
 rredlist     0.1.0.9000 2016-03-07 local (ropenscilabs/rredlist@d9f4f38) 
 rsconnect    0.4.3      2016-05-02 CRAN (R 3.2.5)                        
 stringi      1.0-1      2015-10-22 CRAN (R 3.2.2)                        
 stringr      1.0.0      2015-04-30 CRAN (R 3.2.0)                        
 taxize       0.7.5.9000 2016-04-28 local (ropensci/taxize)               
 tibble       1.0        2016-03-23 CRAN (R 3.2.4)                        
 tidyr        0.4.1      2016-02-05 CRAN (R 3.2.3)                        
 uuid         0.1-2      2015-07-28 CRAN (R 3.2.0)                        
 withr        1.0.1      2016-02-04 CRAN (R 3.2.3)                        
 XML          3.98-1.4   2016-03-01 CRAN (R 3.2.3)                        
 xml2         0.1.2      2015-09-01 CRAN (R 3.2.0) 
cboettig commented 8 years ago

Weird, I can't replicate. I see:

> birds
A nexml object representing:
     1 phylogenetic tree blocks, where: 
     block 1 contains 1 phylogenetic trees 
     9 meta elements 
     0 character matrices 
     23 taxonomic units 
 Taxa:   Struthioniformes, Tinamiformes, Craciformes, Galliformes, Anseriformes, Turniciformes ... 

 NeXML generated by RNeXML using schema version: 0.9 
 size: 312.8 Kb 

with sessionInfo:

> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RNeXML_2.0.6 ape_3.4     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4       xml2_0.1.2        magrittr_1.5      uuid_0.1-2        rotl_0.5.0.901    lattice_0.20-33   R6_2.1.2          foreach_1.4.3     dplyr_0.4.3       stringr_1.0.0    
[11] httr_1.1.0        plyr_1.8.3        tools_3.2.5       parallel_3.2.5    rredlist_0.1.0    bold_0.3.5        grid_3.2.5        data.table_1.9.6  nlme_3.1-127      DBI_0.3.1        
[21] iterators_1.0.8   lazyeval_0.1.10   assertthat_0.1    taxize_0.7.5.9000 tidyr_0.4.1       reshape2_1.4.1    codetools_0.2-14  stringi_1.0-1     rncl_0.6.0        XML_3.98-1.4     
[31] jsonlite_0.9.19   reshape_0.8.5     chron_2.3-47 

Wild guessing it might be related to the dev version of dplyr you are running? Can you show the error trace?

sckott commented 8 years ago

I'll check dplyr version

sckott commented 8 years ago

yeah, that's it, works fine with current cran version of dplyr - i guess closing this is good, or maybe there's something to fix for upcoming dplyr on CRAN

On the add_basic_meta() call I get this (with trace below the error):

#> Error: Unknown column 'content' 
#>   stop("Unknown column '", i, "'", call. = FALSE) 
#>   `$.tbl_df`(mymeta, content) 
#>   mymeta$content 
cboettig commented 8 years ago

@sckott Sounds like we better fix this for the upcoming dplyr anyhow. Not sure what the issue is though. Can you drop into debug and see if mymeta still has a column called content? It looks like it should have a column called content and a column called property (it gets these names from the xml attributes in parsing the NeXML). Maybe "content" is now protected somehow and we just need to do mymeta[["content"]] instead to access the column?

sckott commented 8 years ago

right, can do

sckott commented 8 years ago

Getting error now on calls to new("nexml"), e.g.,

mymeta$content
#> Error: Unknown column 'property'

the new data.frames used by dplyr, tibble, tidyr, all I think error on calling columns that don't exist, instead of returning NA

sckott commented 8 years ago

@cboettig I think that's the main problem, how do you prefer to handle it? Seems we could catch every call to a column name and give back NULL when the column doesn't exist. e.g.,

`%||%` <- function(x, y) {
  if (inherits(x, "error")) y else x
}

tryCatch(mymeta$content, error = function(e) e) %||% NULL

if that column content doesn't exist, we get a NULL

cboettig commented 8 years ago

Hmm, I don't think that will fix it in general; there are other places this arises than regards to the content column on the mymeta data.frame. I believe that function calls get_level() , which calls some tidyr function.

I think we need to do the traceback and figure out which tidyr call has changed and how (partly to make sure we are not just losing some data in the process). Sorry, I know debugging someone else's code is tricky, I may get around to this eventually! I think just going through the error call-back stack on an R instance with dev tidyr and comparing to an R instance with CRAN tidyr should identify the tidyr return that changed; there's not many tidyr calls and I think I have them namespaced.

sckott commented 8 years ago

okay, i'll keep digging, and look for tidyr calls specifically

sckott commented 8 years ago

@cboettig still tweaking travis builds trying to get them to work, not sure how to get dev version of devtools installed on travis

hlapp commented 8 years ago

It looks I'm getting this same error from rphenoscape:

> library(rphenoscape)
> library(RNeXML)
Loading required package: ape
Warning messages:
1: package ‘RNeXML’ was built under R version 3.2.4 
2: package ‘ape’ was built under R version 3.2.5 
> f <- system.file("examples", "trees.xml", package="RNeXML")
> nexml_read(f)
Error: Unknown column 'content'
> 
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rphenoscape_0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5      xml2_0.1.2       magrittr_1.5     uuid_0.1-2       rotl_3.0.0      
 [6] ape_3.5          lattice_0.20-33  R6_2.1.2         foreach_1.4.3    dplyr_0.4.3     
[11] stringr_1.0.0    rentrez_1.0.2    httr_1.2.0       plyr_1.8.4       tools_3.2.3     
[16] parallel_3.2.3   rredlist_0.1.0   bold_0.3.5       RNeXML_2.0.6     grid_3.2.3      
[21] data.table_1.9.6 nlme_3.1-128     DBI_0.4-1        iterators_1.0.8  lazyeval_0.2.0  
[26] assertthat_0.1   tibble_1.0       taxize_0.7.8     tidyr_0.5.1      reshape2_1.4.1  
[31] codetools_0.2-14 stringi_1.1.1    rncl_0.6.0       XML_3.98-1.4     jsonlite_0.9.22 
[36] reshape_0.8.5    chron_2.3-47    
>

I come here because the rphenoscape vignette fails to build now:

==> devtools::check(document = FALSE)

Setting env vars ---------------------------------------------------------------
CFLAGS  : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building rphenoscape -----------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ  \
  --no-save --no-restore --quiet CMD build '/Users/lapp/Projects/rphenoscape'  \
  --no-resave-data --no-manual 

* checking for file ‘/Users/lapp/Projects/rphenoscape/DESCRIPTION’ ... OK
* preparing ‘rphenoscape’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Note: the specification for S3 class "AsIs" in package 'DBI' seems equivalent to one from package 'jsonlite': not turning on duplicate class definitions for this class.
Quitting from lines 49-50 (rphenoscape.Rmd) 
Error: processing vignette 'rphenoscape.Rmd' failed with diagnostics:
no applicable method for 'nexml_read' applied to an object of class "c('xml_document', 'xml_node')"
Execution halted
Error: Command failed (1)
Execution halted

Exited with status 1.

Lines 49-50 in rphenoscape.Rmd is this:

nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine")

The help for nexml_read says pretty clearly that xml_document and xml_node are OK and supported, so perhaps the error message is bogus and it's really the Error: Unknown column 'content' issue that's causing this?

I'm slated to present this on Tuesday at Evolution, so this is not welcome news ...

cboettig commented 8 years ago

@hlapp I think this is already fixed in 2.0.7. this error was due to a change in tidyr package dependency.

Can you be sure to install the latest version of this package with devtools:: install_github

hlapp commented 8 years ago

@cboettig assuming you mean RNeXML with v2.0.7, which branch would that be? master shows as v2.0.6.

cboettig commented 8 years ago

Oh right, haven't pushed the bump in version yet since cran hasn't accepted it (waiting behind the dplyr release which just went to cran). But install from github should a still fix the issue

hlapp commented 8 years ago

That indeed made the nexml_read() work on a file:

> library(RNeXML)
Loading required package: ape
Warning message:
package ‘ape’ was built under R version 3.2.5 
> f <- system.file("examples", "trees.xml", package="RNeXML")
> nexml_read(f)
A nexml object representing:
     1 phylogenetic tree blocks, where: 
     block 1 contains 2 phylogenetic trees 
     5 meta elements 
     0 character matrices 
     5 taxonomic units 
 Taxa:   species 1, species 2, species 3, species 4, species 5 ... 

 NeXML generated by RNeXML using schema version: 0.9 
 size: 130.3 Kb 
>

But the issue raised in the rphenoscape code and vignette remains:

> library("rphenoscape")
> nex <- pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine")
Error in UseMethod("nexml_read") : 
  no applicable method for 'nexml_read' applied to an object of class "c('xml_document', 'xml_node')"
> traceback()
2: nexml_read(out) at pk_get_xml.R#56
1: pk_get_ontotrace_xml(taxon = c("Ictalurus", "Ameiurus"), entity = "fin spine")
> 

Any thoughts why? Is this a separate issue I should post separately on the tracker?

hlapp commented 8 years ago

Looks like xml2 has changed its return value. I'll post this as a separate issue.

cboettig commented 8 years ago

Think we can close this now too