meyer-lab-cshl / plinkQC

R package for quality control of plink genetic datasets
Other
55 stars 28 forks source link

Cannot find sample in the prefix merged dataset #46

Closed rskl92 closed 2 years ago

rskl92 commented 2 years ago

Describe the bug I am at the step where we create the ancestry plot. However, I receive this message Error in evaluate_check_ancestry(indir = indir, qcdir = indir, name = name, : There are no ~/IMAGENQC/imagen.fam samples in the prefixMergedDataset

To Reproduce My code name <- 'imagen' refname <- 'allphase3.no_duplicate_variants.clean.no_ac_gt_snps' prefixMergedDataset <- "imagen.merge.allphase3.no_duplicate_variants.clean"

exclude_ancestry <- evaluate_check_ancestry(indir=indir, qcdir=indir,name=name, prefixMergedDataset, refSamplesFile=paste(indir, "/1000g_ID2POP.txt", sep=""), refColorsFile=paste(indir, "/1000g_PopColors.txt", sep=""), interactive=TRUE)

My .fam file looks like this V1 V2
1: 00009974779239 00009974772399 2: 0000998732552 0000998732552 3: 0000998759862 0000998759862 4: 0000998888560 0000998888560 5: 0000999300251 0000999300251 6: 0000999549021 0000999549021

I had the suspicion that something is going on with the leading zeroes when they are read with the plink QC code. However, I made it alphanumeric and I still get the same message but I am out of ideas as those IDs are in the prefix merged dataset

Expected behavior a list of ids that are not European and an principal components plot

Error messages Error in evaluate_check_ancestry(indir = indir, qcdir = indir, name = name, : There are no ~/IMAGENQC/imagen.fam samples in the prefixMergedDataset

Please complete the following information):

session_info() ─ Session info ──────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.1.2 (2021-11-01) os macOS Big Sur 10.16 system x86_64, darwin17.0 ui RStudio language (EN) collate en_GB.UTF-8 ctype en_GB.UTF-8 tz Europe/London date 2022-01-25 rstudio 1.4.1103 Wax Begonia (desktop) pandoc NA

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────── ! package version date (UTC) lib source cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0) callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2) colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.1.0) crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0) data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0) desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.0) devtools 2.4.3 2021-11-30 [1] CRAN (R 4.1.0) dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0) generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) getopt 1.20.3 2019-03-22 [1] CRAN (R 4.1.0) ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) V glue 1.6.0 2022-01-22 [1] CRAN (R 4.1.2) (on disk 1.6.1) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) optparse 1.7.1 2021-10-08 [1] CRAN (R 4.1.0) pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.0) plinkQC 0.3.4 2022-01-24 [1] Github (meyer-lab-cshl/plinkQC@a0337eb) plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.1.2) readr 2.1.1 2021-11-30 [1] CRAN (R 4.1.0) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.0) rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0) sys 3.4 2020-07-23 [1] CRAN (R 4.1.0) testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.0) tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0) UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.1.0) usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.0)

Additional context Add any other context about the problem here.

rskl92 commented 2 years ago

I have checked that the IDs if the original file that I want to qc are in the prefixmerged dataset and they are