Closed Astahlke closed 5 years ago
Hi Thierry,
I've done a little more troubleshooting here. It seems like I can't replicate this issue on my local mac, but it is persistent on the HPC. Why is the imputation module throwing a missing data error when there aren't any NA in the data set?
Any ideas?
Thank you!
> gc <- radiator::genomic_converter(data = miss.genlight,
+ output = "genlight",
+ imputation.method = "rf",
+ monomorphic.out = FALSE,
+ hierarchical.levels = "global",
+ verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /mnt/ceph/stah3621/imputation
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: FALSE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: rf
hierarchical.levels: global
parallel.core: 47
#######################################################################
Importing data
Number of markers missing in all individuals and removed: 1
Tidy genomic data:
Number of markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals: 94
Preparing data for output
Data is bi-allelic
#######################################################################
####################### grur::grur_imputations ########################
#######################################################################
Imputation method: rf
Hierarchical levels: global
On-the-fly-imputations options:
number of trees to grow: 50
minimum terminal node size: 1
non-negative integer value used to specify random splitting: 10
number of iterations: 10
Number of CPUs: 47
Note: If you have speed issues: follow radiator's vignette on parallel computing
Number of populations: 1
Number of individuals: 94
Number of markers: 500
Proportion of missing genotypes before imputations: 0.298319
On-the-fly-imputations using Random Forests algorithm
Imputations computed globally, take a break...
Adjusting REF/ALT alleles to account for imputations...
generating REF/ALT dictionary
integrating new genotype codings...
Proportion of missing genotypes after imputations: 0
Computation time: 8 sec
################## grur::grur_imputations completed ###################
Generating adegenet genlight object without imputation
Generating adegenet genlight object WITH imputations
Writing tidy data set:
radiator_data_20190125@1528.rad
Writing tidy data set:
radiator_data_20190125@1528.rad
############################### RESULTS ###############################
Data format of input: genlight
Biallelic data
Number of common markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals 94
Computation time: 12 sec
################ radiator::genomic_converter completed ################
Warning message:
In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method, :
Missing data is still present in the dataset
2 options:
run the function again with hierarchical.levels = 'global'
use common.markers = TRUE when using hierarchical.levels = 'strata'
> anyNA(as.matrix(gc$genlight.imputed))
[1] FALSE
From what I can tell, R and package issues are the same in the important ways:
On my local mac:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] bindrcpp_0.2.2 psych_1.8.12 vegan_2.5-3
[4] lattice_0.20-38 permute_0.9-4 LEA_2.4.0
[7] tidyr_0.8.2 adegenet_2.1.1 ade4_1.7-13
[10] randomForestSRC_2.8.0 radiator_0.0.21
And on the HPC. Could the locale variables have an impact?
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] dplyr_0.7.8 LEA_2.4.0 randomForestSRC_2.8.0
[4] bindrcpp_0.2.2 psych_1.8.12 vegan_2.5-3
[7] lattice_0.20-38 permute_0.9-4 tidyr_0.8.2
[10] radiator_0.0.21 adegenet_2.1.1 ade4_1.7-13
Impossible for me to reproduce the error.
The imputation module was moved out of radiator.
It now reside inside grur only, because of cross-dependency issue to submit to CRAN.
genomic_converter
will be added to grur imputations module in the next release of grur, next week.
Hi Thierry,
We have a new RF issue with v0.0.20, where a warning indicates that there's still missing data after imputation, even though I don't see any NA in the imputed genlight object.
Thanks for any help!
Amanda
####################################################################### ##################### radiator::genomic_converter ##################### ####################################################################### Function arguments and values: Working directory: /mnt/ceph/stah3621/imputation Input file: from global environment Strata: no Population levels: no Population labels: no Output format(s): tidy, genlight Filename prefix: no Filters: Blacklist of individuals: no Blacklist of genotypes: no Whitelist of markers: no monomorphic.out: FALSE snp.ld: no common.markers: TRUE max.marker: no pop.select: no maf.thresholds: no
Imputations options: imputation.method: rf hierarchical.levels: global
parallel.core: 47
#######################################################################
Importing data
Number of markers missing in all individuals and removed: 1
Tidy genomic data: Number of markers: 500 Number of chromosome/contig/scaffold: 1 Number of individuals: 94
Preparing data for output
####################################################################### ####################### grur::grur_imputations ######################## ####################################################################### Imputation method: rf Hierarchical levels: global On-the-fly-imputations options: number of trees to grow: 50 minimum terminal node size: 1 non-negative integer value used to specify random splitting: 10 number of iterations: 10 Number of CPUs: 47 Note: If you have speed issues: follow radiator's vignette on parallel computing
Number of populations: 1 Number of individuals: 94 Number of markers: 500
Proportion of missing genotypes before imputations: 0.298319 On-the-fly-imputations using Random Forests algorithm Imputations computed globally, take a break... Adjusting REF/ALT alleles to account for imputations... generating REF/ALT dictionary integrating new genotype codings...
Proportion of missing genotypes after imputations: 0
Computation time: 8 sec ################## grur::grur_imputations completed ################### Generating adegenet genlight object without imputation Generating adegenet genlight object WITH imputations
Writing tidy data set: radiator_data_20190117@1007.rad
Writing tidy data set: radiator_data_20190117@1007.rad ############################### RESULTS ############################### Data format of input: genlight Biallelic data Number of common markers: 500 Number of chromosome/contig/scaffold: 1 Number of individuals 94
Computation time: 11 sec ################ radiator::genomic_converter completed ################ Warning messages: 1: In cleanup(mc.cleanup) : unable to terminate child: No such process 2: In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method, : Missing data is still present in the dataset 2 options: run the function again with hierarchical.levels = 'global' use common.markers = TRUE when using hierarchical.levels = 'strata'
> which(is.na(as.matrix(gc$genlight.imputed)))
integer(0)> which(is.na(as.matrix(gc$genlight.no.imputation)))[1:10]
[1] 1 2 4 5 8 9 11 12 16 17> sessionInfo()
R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)Matrix products: default BLAS: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRblas.so LAPACK: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRlapack.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods [7] base
other attached packages: [1] bindrcpp_0.2.2 randomForestSRC_2.8.0 psych_1.8.10 [4] vegan_2.5-3 lattice_0.20-38 permute_0.9-4 [7] tidyr_0.8.2 adegenet_2.1.1 ade4_1.7-13 [10] radiator_0.0.18
loaded via a namespace (and not attached): [1] nlme_3.1-137 fs_1.2.6 usethis_1.4.0 [4] devtools_2.0.1 gmodels_2.18.1 rprojroot_1.3-2 [7] tools_3.5.0 backports_1.1.3 R6_2.3.0 [10] spData_0.3.0 lazyeval_0.2.1 mgcv_1.8-26 [13] colorspace_1.4-0 withr_2.1.2 sp_1.3-1 [16] tidyselect_0.2.5 prettyunits_1.0.2 mnormt_1.5-5 [19] processx_3.2.1 curl_3.3 compiler_3.5.0 [22] cli_1.0.1 expm_0.999-3 desc_1.2.0 [25] scales_1.0.0 readr_1.3.1 callr_3.1.1 [28] stringr_1.3.1 digest_0.6.18 foreign_0.8-71 [31] pkgconfig_2.0.2 htmltools_0.3.6 fst_0.8.10 [34] sessioninfo_1.1.1 rlang_0.3.1 shiny_1.2.0 [37] bindr_0.1.1 gtools_3.8.1 spdep_0.8-1 [40] dplyr_0.7.8 magrittr_1.5 Matrix_1.2-15 [43] Rcpp_1.0.0 munsell_0.5.0 ape_5.2 [46] stringi_1.2.4 MASS_7.3-51.1 pkgbuild_1.0.2 [49] plyr_1.8.4 grid_3.5.0 parallel_3.5.0 [52] gdata_2.18.0 listenv_0.7.0 promises_1.0.1 [55] crayon_1.3.4 deldir_0.1-15 splines_3.5.0 [58] hms_0.4.2 ps_1.3.0 pillar_1.3.1 [61] igraph_1.2.2 boot_1.3-20 seqinr_3.4-5 [64] reshape2_1.4.3 codetools_0.2-16 pkgload_1.0.2 [67] LearnBayes_2.15.1 glue_1.3.0 data.table_1.12.0 [70] remotes_2.0.2 httpuv_1.4.5.1 testthat_2.0.1 [73] gtable_0.2.0 purrr_0.2.5 future_1.10.0 [76] amap_0.8-16 assertthat_0.2.0 ggplot2_3.1.0 [79] mime_0.6 xtable_1.8-3 coda_0.19-2 [82] later_0.7.5 tibble_2.0.1 pbmcapply_1.3.1 [85] memoise_1.1.0 cluster_2.0.7-1 globals_0.12.4