Closed Rodin67 closed 5 years ago
Hi, thanks for reporting this.
I have created a minimal example where all ids in data.fam are numeric but I cannot reproduce the issue. The following shows str()
of the data.frames within evaluate_check_ancestry
:
str(refSamples)
'data.frame': 1184 obs. of 2 variables:
$ IID: chr "NA19919" "NA19916" "NA19835" "NA20282" ...
$ Pop: chr "ASW" "ASW" "ASW" "ASW" ...
str(samples)
'data.frame': 200 obs. of 2 variables:
$ FID: int 26 125 162 169 147 152 187 17 153 5 ...
$ IID: int 26 125 162 169 147 152 187 17 153 5 ...
str(pca_data)
'data.frame': 1384 obs. of 4 variables:
$ FID: chr "181" "182" "183" "184" ...
$ IID: chr "181" "182" "183" "184" ...
$ PC1: num 0.006036 0.006859 -0.001683 0.000583 0.006763 ...
$ PC2: num 0.01439 0.00602 -0.00773 -0.00948 0.00841 ...
If I understand correctly, you suspect that samples$IID
being numeric causes your error message? I don't find this here. Do you have additional constraints? Could you provide an example dataset where this fails?
Thank you!
I am closing this now as I cannot reproduce this issue. Feel free to re-open with example data that shows the issue.
Thanks
Describe the bug Using evaluate_check_ancestry or perIndividualQC functions with a sample containing only numeric IDs leads to an error.
To Reproduce Using a .fam file containing only numeric IDs.
Expected behavior Returning IDs with ancestry check failure.
Error messages "There are samples in the prefixMergedDataset that cannot be found in refSamples or XXX.fam"
Please complete the following information:
R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale: [1] LC_CTYPE=fr_CA.UTF-8 LC_NUMERIC=C LC_TIME=fr_CA.UTF-8
[4] LC_COLLATE=fr_CA.UTF-8 LC_MONETARY=fr_CA.UTF-8 LC_MESSAGES=fr_CA.UTF-8
[7] LC_PAPER=fr_CA.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=fr_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] plinkQC_0.2.2 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2
[6] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached): [1] tidyselect_0.2.5 haven_2.1.1 lattice_0.20-38 colorspace_1.4-1 [5] vctrs_0.2.0 generics_0.0.2 getopt_1.20.3 utf8_1.1.4
[9] rlang_0.4.0 pillar_1.4.2 glue_1.3.1 optparse_1.6.2
[13] withr_2.1.2 tweenr_1.0.1 bit64_0.9-7 modelr_0.1.5
[17] readxl_1.3.1 plyr_1.8.4 munsell_0.5.0 gtable_0.3.0
[21] cellranger_1.1.0 rvest_0.3.4 labeling_0.3 UpSetR_1.4.0
[25] fansi_0.4.0 broom_0.5.2 Rcpp_1.0.2 scales_1.0.0
[29] backports_1.1.4 jsonlite_1.6 farver_1.1.0 bit_1.1-14
[33] gridExtra_2.3 digest_0.6.20 ggforce_0.3.1 hms_0.5.1
[37] stringi_1.4.3 polyclip_1.10-0 grid_3.6.1 cowplot_1.0.0
[41] cli_1.1.0 tools_3.6.1 magrittr_1.5 lazyeval_0.2.2
[45] crayon_1.3.4 pkgconfig_2.0.2 zeallot_0.1.0 MASS_7.3-51.4
[49] data.table_1.12.2 xml2_1.2.2 lubridate_1.7.4 assertthat_0.2.1 [53] httr_1.4.1 rstudioapi_0.10 R6_2.4.0 nlme_3.1-141
[57] compiler_3.6.1
Additional context Adding to the function something like "mutate_all(samples, .funs = function(x) as.character(x))" after creating the samples data frame helps.