Closed caimiao0714 closed 3 years ago
Seems like an Rcpp issue that I have never seen before.
Which packageVersion("bigsnpr")
do you have?
I have recently added more checks before going into Rcpp to try catch possible errors before.
In fact, it seems to be one of my own assertions, which I have never encountered before. https://github.com/privefl/bigsnpr/blob/2768100433a5487aec7cf3832ee399bc8b738a31/src/read-bgen.cpp#L32
What is the range of the positions you're trying to read? Are there any 0s maybe?
Thank you! I’m sorry but what does these assertions do? I’m not sure why I have non-positive pos. I assume these SNP ids are correct, and I did not specify the ind_row in the function.
In fact, it seems to be one of my own assertions, which I have never encountered before. https://github.com/privefl/bigsnpr/blob/2768100433a5487aec7cf3832ee399bc8b738a31/src/read-bgen.cpp#L32
What is the range of the positions you're trying to read? Are there any 0s maybe?
How can I check the range of the positions? I did not specify the ind_row
, so assumingly all the positions will be read.
When you read the MFI files.
I tried to validate if there is any non-positive positions, and it seems that my mfi files do NOT include any non-positive pos. I added a condition to force V3 > 0 in the subset sentence (see my code below).
system.time({
list_snp_id <- foreach(chr = 1:22) %dopar% {
# cat("Processing chromosome", chr, "..\n")
mfi <- paste0("Data/ukb_imp_mfi/ukb_mfi_chr", chr, "_v3.txt")
infos_chr <- bigreadr::fread2(mfi, showProgress = FALSE)
infos_chr_sub <- subset(infos_chr, V6 > 0.01 & V3 > 0) ## <------ I added V3 > 0 to make sure all pos are positive
bim <- paste0("Data/ukb_snp_bim/ukb_snp_chr", chr, "_v2.bim")
map_chr <- bigreadr::fread2(bim)
joined <- dplyr::inner_join(
infos_chr_sub, map_chr,
by = c("V3" = "V4", "V4" = "V5", "V5" = "V6")
)
with(joined, paste(chr, V3, V4, V5, sep = "_"))
}
})
I'm using R 3.6.1 on CentOS Linux 7, bigsnpr 1.6.1
. Below is my session info:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /export/home/caim29/software/anaconda3/envs/r3.2/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] bigsnpr_1.6.1 bigstatsr_1.5.1 bigreadr_0.2.4 benchmarkme_1.0.7 doParallel_1.0.16 iterators_1.0.13
[7] foreach_1.5.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 benchmarkmeData_1.0.4 bigsparser_0.4.4 bigparallelr_0.3.1
[5] pillar_1.6.0 compiler_3.6.1 tools_3.6.1 gtable_0.3.0
[9] lifecycle_1.0.0 tibble_3.1.1 lattice_0.20-44 pkgconfig_2.0.3
[13] rlang_0.4.11 Matrix_1.3-3 DBI_1.1.1 flock_0.7
[17] yaml_2.2.1 bigassertr_0.1.4 dplyr_1.0.6 httr_1.4.2
[21] generics_0.1.0 vctrs_0.3.8 cowplot_1.1.1 grid_3.6.1
[25] tidyselect_1.1.1 data.table_1.14.0 glue_1.4.2 R6_2.5.0
[29] fansi_0.4.2 parallelly_1.25.0 pacman_0.5.1 purrr_0.3.4
[33] ggplot2_3.3.3 magrittr_2.0.1 scales_1.1.1 codetools_0.2-18
[37] ellipsis_0.3.2 assertthat_0.2.1 colorspace_2.0-1 utf8_1.2.1
[41] munsell_0.5.0 crayon_1.4.1
I'm suspecting pos not positive is not the real issue here (based on my code above). Let me know if you have any thoughts/suggestions.
Thank you, Miao
BTW, what is sum(lengths(list_snp_id))
?
BTW, what is
sum(lengths(list_snp_id))
?
I think this is just checking how many SNP IDs were included.
Yes, forget about that.
I'll try to reproduce what you did. Do you get this error already when restricting to 1 chromosome?
Do you get this error when using ncores = 1
?
Yes, forget about that. I'll try to reproduce what you did. Do you get this error already when restricting to 1 chromosome? Do you get this error when using
ncores = 1
?
I spent a whole day testing all different combinations of chr index and the number of cores. It seems that I encounter the same exact error message (see below) whenever chr 1 is included, even if I only use 1 core. But the program can work perfectly well when I don't include chr 1, regardless of how many scores I use.
terminate called after throwing an instance of ‘Rcpp::exception’
What(): Positions should be positive.
Aborted (core dumped)
So, the problem is with chromosome 1 only? It is possible that your file is corrupted of some kind?
You can compare tools::md5sum("Data/imp/ukb_imp_chr1_v3.bgen")
with https://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=997.
You can compare
tools::md5sum("Data/imp/ukb_imp_chr1_v3.bgen")
with https://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=997.
Thank you! I checked and it does seem to be a data issue. The md5 checksum of chr 1 in my data set does not match that of the uk biobank website, while chr 2 to 22 do match. I'll try to re-download the data.
I’ve been trying to use the
bigsnpr
package to read the UK Biobank genetics data, but it seems that I can’t really replicate your code. I’ve been had “positions should be positive” error although (I think) I’m using the same code as you did. Here are my code:The error message is:
My questions is: I’m using the SNP id from the original mfi files and everything looks fine, I’m not sure why the position is not positive. Any comments/suggestions would be extremely useful and appreciated!
Thank you, Miao