ropensci / phylocomr

Phylocom R interface
https://docs.ropensci.org/phylocomr
Other
15 stars 7 forks source link

Segmentation fault with zanne2014 tree #18

Closed daijiang closed 7 years ago

daijiang commented 7 years ago

Hi Scott, I am using the phylocomr package to get a phylogeny for >10k species based on the R20120829 and zanne2014 phylogeny. I have a couple of questions, which are not actually related with the technical side of this package (I apology for this).

  1. I assume that the generated tree will be the same if I use phylocomr, or brranching::phylomatic_local or phylomatic (shell), correct?
  2. When I get the phylogeny from zanne2014, I got a Segmentation fault error when analyzing with a sample file; however, it is fine with the phylogeny from R20120829 with the same sample file. I am sure it is the phylogeny file problem, since the sample file is the same. However, the source code are written in C, which hinders me to figure out why. I wonder can you help me out with this? Thank you very much.

I attached the sample file (simplified) and the two phylogenies.

ph_pd(sample = "sample.txt", phylo = "zanne.txt")
## Error: Program '/Users/dli/R/phylocomr/bin//phylocom' terminated by SIGNAL (Segmentation fault: 11)
ph_pd(sample = "sample.txt", phylo = "apg.txt")
## A tibble: 3 x 5
## ...

sample.txt zanne.txt apg.txt

Thanks for all your work. Daijiang

sckott commented 7 years ago

hi @daijiang thanks for the issue! 👍

will have a look and get back to you very soon

note also that there seems to be memory leaks or so in the C library - which leads to random errors - so code works sometimes and not others - does that error always happen?

can you please also share your session info

daijiang commented 7 years ago

Thanks Scott!

This error always happened. And here is my session info:

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5    

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib    

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base         

other attached packages:
[1] phylocomr_0.1.0    

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0    tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   rlang_0.1.1    sys_1.3       
daijiang commented 7 years ago

Found the same issue at phylocom's github repo. I am sure that my 2nd question is the problem of phylocom.

https://github.com/phylocom/phylocom/issues/13

sckott commented 7 years ago

I assume that the generated tree will be the same if I use phylocomr, or brranching::phylomatic_local or phylomatic (shell), correct?

should be, AFAIK

Where does the zanne tree come from? Has lots of node labels as NA, perhaps that's a problem.

daijiang commented 7 years ago

Thanks @sckott for looking into this. The zanne tree is from Phylomatic. The NA actually does not matter. I have tried to remove all of them and the tree still does not work.

It turns out that the "raw" tree generated from Phylomatic usually will work with downstream analyses using Phylomatic. But, the "raw" tree has many singleton nodes and ape::read.tree will throw errors back when you try to read it into R. If I use the tree as is, it will be fine for Phylomatic (but not for R); if I read it into R using phytools::read.newick and clean the singleton nodes and save it, then the saved tree will be fine with R (but not for Phylomatic). This is what I have learned so far.

So I just used both the "raw" version (for Phylomatic) and the "cleaned" version (for R); so far everything works, just a bit fragmented. I hope Phylomatic will update its cleanphy function soon to clean the singleton nodes better.

sckott commented 7 years ago

Thanks for the update, been looking at this today. if we only had informative errors messages we could diagnose faster, working on diagnosing where the error is occurring.

if I read it into R using phytools::read.newick and clean the singleton nodes

how is the "clean the singleton nodes" done?

sckott commented 7 years ago

issue opened https://github.com/phylocom/phylocom/issues/26

daijiang commented 7 years ago

It was done as ape::collapse.singles().

phytools::read.newick(text = ) %>% ape::collapse.singles() %>% ape::ladderize() %>% ape::write.tree()
daijiang commented 7 years ago

Try this phylogeny generated by Phylomatic: zanne_phylocom.txt.

veg = read.table("https://github.com/ropensci/phylocomr/files/1053327/sample.txt")
veg$V3 = tolower(veg$V3)
# this works
phylocomr::ph_pd(sample = veg, phylo = readLines("https://github.com/ropensci/phylocomr/files/1070084/zanne_phylocom.txt")[1]) 
# # A tibble: 3 x 5
#   sample ntaxa       pd   treebl proptreebl
#     <chr> <int>    <dbl>    <dbl>      <dbl>
# 1   CORO    42 2901.513 394429.2      0.007
# 2   DEPO     8  831.123 394429.2      0.002
# 3   GRKO    86 3400.847 394429.2      0.009

# after cleaning with R, it does not
tree1 = ape::collapse.singles(phytools::read.newick("https://github.com/ropensci/phylocomr/files/1070084/zanne_phylocom.txt"))
phylocomr::ph_pd(sample = veg, phylo = ape::write.tree(tree1))
# Error: Program '/Users/dli/R/phylocomr/bin//phylocom' terminated by SIGNAL (Segmentation fault: 11)
sckott commented 7 years ago

thanks @daijiang , pinged the phylocom issue with this example for additioanl info

sckott commented 7 years ago
sckott commented 7 years ago

@daijiang can you try it again after reinstalling

daijiang commented 7 years ago

Thanks @sckott for fixing this!

veg = read.table("https://github.com/ropensci/phylocomr/files/1053327/sample.txt")
veg$V3 = tolower(veg$V3)
tree1 = ape::collapse.singles(phytools::read.newick("https://github.com/ropensci/phylocomr/files/1070084/zanne_phylocom.txt"))
phylocomr::ph_pd(sample = veg, phylo = ape::write.tree(tree1)) # does not work
phylocomr::ph_pd(sample = veg, phylo = tree1) # does work
phylocomr::ph_pd(sample = veg, phylo = phylocomr:::write_tree_(tree1)) # does work

So, yes, it works!

sckott commented 7 years ago

Great!