liamrevell / phytools

GNU General Public License v3.0
198 stars 56 forks source link

Strange discrepancy in duration of processing for make.simmap #113

Closed flashton2003 closed 1 year ago

flashton2003 commented 2 years ago

Hello,

I'm trying to write a workflow that randomly sub-samples from phylogeny and repeats the analysis multiple times (maybe 100-1000), in order to try and understand the influence of sampling on my SIMMAP analysis. Each time I only take 30 tips from the tree for SIMMAP analysis.

However, I'm getting a strange discrepancy in the amount of time the make.simmap function is taking to run.

In the first random sample I made for testing/development of the pipeline, the analysis ran very quickly, finishing within ~1 minute, it seemed to give sensible results. Then, after I was generating the input files in an automated way, the analysis doesn't complete, even after ~45 minutes. I've inspected the files, and there is no discernable difference between the inputs for the fast and slow samples (although definitely possible I'm missing something!?).

Any idea what might be happening here?

Here is the script I've used, the input files (for both fast and slow), and my sessionInfo().

Script.

Slow tree & metadata.

Fast tree & metadata.

I've run 11 random samples, generated with the same procedure as the slow one, and they were all slow as well (I quit them after 30 minutes running each).

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
[1] readr_2.0.0    phytools_1.0-3 maps_3.4.0    
[4] ape_5.5       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7              compiler_4.1.0         
 [3] pillar_1.6.2            tools_4.1.0            
 [5] bit_4.0.4               lifecycle_1.0.0        
 [7] tibble_3.1.3            nlme_3.1-152           
 [9] lattice_0.20-44         pkgconfig_2.0.3        
[11] rlang_0.4.11            Matrix_1.3-4           
[13] fastmatch_1.1-3         igraph_1.2.6           
[15] cli_3.0.1               rstudioapi_0.13        
[17] expm_0.999-6            coda_0.19-4            
[19] withr_2.4.2             vctrs_0.3.8            
[21] hms_1.1.0               tidyselect_1.1.1       
[23] bit64_4.0.5             combinat_0.0-8         
[25] grid_4.1.0              scatterplot3d_0.3-41   
[27] glue_1.4.2              R6_2.5.0               
[29] plotrix_3.8-2           fansi_0.5.0            
[31] vroom_1.5.3             phangorn_2.7.1         
[33] purrr_0.3.4             tzdb_0.1.2             
[35] magrittr_2.0.1          codetools_0.2-18       
[37] ellipsis_0.3.2          MASS_7.3-54            
[39] mnormt_2.0.2            numDeriv_2016.8-1.1    
[41] quadprog_1.5-8          utf8_1.2.2             
[43] tmvnsim_1.0-2           crayon_1.4.1           
[45] clusterGeneration_1.3.7
liamrevell commented 1 year ago

Dear @flashton2003. Do you know what step in your code is running so slowly? My guess is that it would be the optimization of the Mk model (using fitMk) which will vary quite a bit from problem to problem, and not just based on the size of the phylogenetic tree. If you have not resolved this yet, please feel free to send me a follow-up email and I will see if I can be of any help. -- Liam

flashton2003 commented 1 year ago

Hi Liam,

Thanks for the response. I used an alternate tool in the end, so this isn't urgent.

If the problem occurs again I'll get in touch.

Thanks again,

Phil