spholmes / F1000_workflow

43 stars 33 forks source link

Problems with generating a phylogenetic tree #38

Open Carlosgenetic-84 opened 3 years ago

Carlosgenetic-84 commented 3 years ago

Hi, this is my first time posting a question here in the github forums. I would like to state that I'm relatively new in using R and in particular using it to analyze DNA sequences.

I was trying to generate a phylogenetic tree for my phyloseq object, employing the following script: treeNJ <- NJ(dm) fit = pml(treeNJ, data=phangAlign) fitGTR <- update(fit, k=4, inv=0.2) fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))

Everthing was running OK, until the final line of: fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))

This is were I recieved the following error message: "Error in if (((ll1 - ll)/ll < control$eps) && rounds > 2) opti <- FALSE : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)

The error messages are:

1: In optimize(f = fn, interval = c(0.1, 500), lower = 0.1, ... : NA/Inf replaced by maximum positive value"

What does it mean and what can I do about it? I've been stuck on this issue for a couples of days now and haven't been able to solve it. Can anyone help me out with this? Other users have stated similar issues, but their solutions haven't worked for me.

I'm using a virtual machine with the following specs:

sessionInfo(): R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) RAM: 125 GB HDD: 180GB Matrix products: default

locale: [1] LC_COLLATE=Spanish_Chile.1252 [2] LC_CTYPE=Spanish_Chile.1252
[3] LC_MONETARY=Spanish_Chile.1252 [4] LC_NUMERIC=C
[5] LC_TIME=Spanish_Chile.1252

attached base packages: [1] grid stats4 parallel stats graphics [6] grDevices utils datasets methods base

other attached packages: [1] knitr_1.29 SummarizedExperiment_1.18.1 [3] DelayedArray_0.14.0 matrixStats_0.56.0
[5] Biobase_2.48.0 GenomicRanges_1.40.0
[7] GenomeInfoDb_1.24.2 BiocManager_1.30.10
[9] phangorn_2.5.5 ape_5.4
[11] Rcpp_1.0.5 xtable_1.8-4
[13] gridExtra_2.3 plyr_1.8.6
[15] XLConnect_1.0.1 Biostrings_2.56.0
[17] XVector_0.28.0 IRanges_2.22.2
[19] S4Vectors_0.26.1 BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] lattice_0.20-41 foreach_1.5.0
[3] zlibbioc_1.34.0 rstudioapi_0.11
[5] data.table_1.12.8 Matrix_1.2-18
[7] BiocParallel_1.22.0 stringr_1.4.0
[9] igraph_1.2.5 RCurl_1.98-1.2
[11] bit_1.1-15.2 tinytex_0.24
[13] compiler_4.0.2 xfun_0.15
[15] pkgconfig_2.0.3 biomformat_1.16.0
[17] GenomeInfoDbData_1.2.3 quadprog_1.5-8
[19] codetools_0.2-16 XML_3.99-0.4
[21] crayon_1.3.4 MASS_7.3-51.6
[23] bitops_1.0-6 nlme_3.1-148
[25] jsonlite_1.7.0 gtable_0.3.0
[27] DBI_1.1.0 magrittr_1.5
[29] stringi_1.4.6 reshape2_1.4.4
[31] fastmatch_1.1-0 Rhdf5lib_1.10.0
[33] iterators_1.0.12 tools_4.0.2
[35] ade4_1.7-15 bit64_0.9-7
[37] rhdf5_2.32.2 cluster_2.1.0
[39] rJava_0.9-13

Thanks,

spholmes commented 3 years ago

Could you please show the size of your current objects? There may be a memory issue I am not seeing. In which case you are better off making the tree outside of R using RAXML Best regards Susan

On Tue, Jul 28, 2020 at 10:27 AM Carlosgenetic-84 notifications@github.com wrote:

Hi, this is my first time posting a question here in the github forums. I would like to state that I'm relatively new in using R and in particular using it to analyze DNA sequences.

I was trying to generate a phylogenetic tree for my phyloseq object, employing the following script: treeNJ <- NJ(dm) fit = pml(treeNJ, data=phangAlign) fitGTR <- update(fit, k=4, inv=0.2) fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))

Everthing was running OK, until the final line of: fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE, rearrangement = "stochastic", control = pml.control(trace = 0))

This is were I recieved the following error message: "Error in if (((ll1 - ll)/ll < control$eps) && rounds > 2) opti <- FALSE : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)

The error messages are:

1: In optimize(f = fn, interval = c(0.1, 500), lower = 0.1, ... : NA/Inf replaced by maximum positive value"

What does it mean and what can I do about it? I've been stuck on this issue for a couples of days now and haven't been able to solve it. Can anyone help me out with this? Other users have stated similar issues, but their solutions haven't worked for me.

I'm using a virtual machine with the following specs:

sessionInfo(): R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) RAM: 125 GB HDD: 180GB Matrix products: default

locale: [1] LC_COLLATE=Spanish_Chile.1252 [2] LC_CTYPE=Spanish_Chile.1252 [3] LC_MONETARY=Spanish_Chile.1252 [4] LC_NUMERIC=C [5] LC_TIME=Spanish_Chile.1252

attached base packages: [1] grid stats4 parallel stats graphics [6] grDevices utils datasets methods base

other attached packages: [1] knitr_1.29 SummarizedExperiment_1.18.1 [3] DelayedArray_0.14.0 matrixStats_0.56.0 [5] Biobase_2.48.0 GenomicRanges_1.40.0 [7] GenomeInfoDb_1.24.2 BiocManager_1.30.10 [9] phangorn_2.5.5 ape_5.4 [11] Rcpp_1.0.5 xtable_1.8-4 [13] gridExtra_2.3 plyr_1.8.6 [15] XLConnect_1.0.1 Biostrings_2.56.0 [17] XVector_0.28.0 IRanges_2.22.2 [19] S4Vectors_0.26.1 BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] lattice_0.20-41 foreach_1.5.0 [3] zlibbioc_1.34.0 rstudioapi_0.11 [5] data.table_1.12.8 Matrix_1.2-18 [7] BiocParallel_1.22.0 stringr_1.4.0 [9] igraph_1.2.5 RCurl_1.98-1.2 [11] bit_1.1-15.2 tinytex_0.24 [13] compiler_4.0.2 xfun_0.15 [15] pkgconfig_2.0.3 biomformat_1.16.0 [17] GenomeInfoDbData_1.2.3 quadprog_1.5-8 [19] codetools_0.2-16 XML_3.99-0.4 [21] crayon_1.3.4 MASS_7.3-51.6 [23] bitops_1.0-6 nlme_3.1-148 [25] jsonlite_1.7.0 gtable_0.3.0 [27] DBI_1.1.0 magrittr_1.5 [29] stringi_1.4.6 reshape2_1.4.4 [31] fastmatch_1.1-0 Rhdf5lib_1.10.0 [33] iterators_1.0.12 tools_4.0.2 [35] ade4_1.7-15 bit64_0.9-7 [37] rhdf5_2.32.2 cluster_2.1.0 [39] rJava_0.9-13

Thanks,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFZPN2RPOORFRJOGG26ZDR54DB3ANCNFSM4PKX5V2A .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

Carlosgenetic-84 commented 3 years ago

Dear Susan,

Thank you very much for your quick response towards my problem in R with DADA2. In regard to your question about the size of my objects, what I found was:

Values:

dm = large dist (27117930 elements, 437.3 MB)

Data:

treeNJ = Large phylo (4 elements, 3.7 MB)

fit= Large pml (22 elements, 22.8 MB)

fitGTR = Large pml (22 elements, 22.8 MB)

Is this the information that you requested or where you asking for something else?

Once again I appreciate you helping me with this issue, since I'm a real beginner with this.

Sincerely,

https://www.bionostra.com

Carlos Salinas Moreira

Carlosgenetic-84 commented 3 years ago

Hi Susan,

I was wondering if the info that I sent you was what you were asking for? and if my problem seems to be a lack of memory or is it something?

Thanks again,

spholmes commented 3 years ago

Yes, I would try doing the ML tree with RAXML (outside of R) and not optim.ml

On Thu, Jul 30, 2020 at 10:24 AM Carlosgenetic-84 notifications@github.com wrote:

Hi Susan,

I was wondering if the info that I sent you was what you were asking for? and if my problem seems to be a lack of memory or is it something?

Thanks again,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/38#issuecomment-666542571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFZPIWBVKVJPVSHHHGGO3R6GUD3ANCNFSM4PKX5V2A .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

Aircus commented 3 years ago

Hej! I am having the same issue. But I just want to do a PCoA analysis thus wanted to add tree information. Is there a good way to py pass this issue?

Thanks! Tong

spholmes commented 3 years ago

For Dpcoa you do need to build a tree, the cheapest one to build is using nj neighborjoining. You might try that to build your tree in r or with one of the external programs.

On Tue, Sep 8, 2020, 05:38 Aircus notifications@github.com wrote:

Hej! I am having the same issue. But I just want to do a PCoA analysis thus wanted to add tree information. Is there a good way to py pass this issue?

Thanks! Tong

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/38#issuecomment-688838484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFZPMNOJHH7W5UNJWRTI3SEYQVFANCNFSM4PKX5V2A .

marwa38 commented 3 years ago

@spholmes spholmes What do you think R need as a size to run this in its environment rather than learning a new tool outside it? Does my drive (500GBs) which I am working on? do you think that is fine to create a phylogenetic tree in R ? Thanks Marwa

spholmes commented 3 years ago

Dear Marwa, it is not a question of hard drive space but CPU assigned to be used within R, you can increase your CPU available to R if you have it by using, see extensive documentation about this in Hadley WIckham's book "Advanced R": http://adv-r.had.co.nz/memory.html

On Mon, Apr 12, 2021, 12:32 marwa38 @.***> wrote:

What do you think R need as a size to run this in its environment rather than learning a new tool outside it? Does my drive (500GBs) which I am working on? do you think that is fine to create a phylogenetic tree in R ? Thanks Marwa

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/38#issuecomment-817734656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFZPI2O5C34YZOYHXDT73TILK53ANCNFSM4PKX5V2A .

marwa38 commented 3 years ago

Thank you very much @spholmes