privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Error running snp_readBGEN/BGI: "Wrong format of some variants" #449

Closed JasperHof closed 9 months ago

JasperHof commented 9 months ago

Hi Florian and users,

I encountered an error which I believe has been reported before, however I was not able to find the solution here on GitHub, or in the tutorials you provided. I created bgen files using qctool (8bits, using -ofiletype bgen_v1.2 -bgen-bits 8). Then, i take the following steps to try to read the .bgen/.bgi files in R:

################################

bgenfile = base filename

bgifile = paste0(bgenfile, ".bgi")

db_con <- RSQLite::dbConnect(RSQLite::SQLite(), bgifile) on.exit(RSQLite::dbDisconnect(db_con), add = TRUE) infos <- dplyr::collect(dplyr::tbl(db_con, "Variant"))

infos$chromosome = chromosome infos$myid <- with(infos, paste(chromosome, position, allele1 allele2, sep = "_"))

For example, infos$myid[1] = "21_11098615_T_G"

snps = list() snps[[1]] = infos$myid[indices]

bgi = snp_readBGI(bgifile, snp_id = snps) #################################

This gives me the error: "Error in format_snp_id(info_id) : Wrong format of some variants.". This error remains when I just try to read a single SNP (for checking). All my SNPs are biallelic, and I've also tried switching the order of alleles in infos$myid. I have re-installed my rlang, bigsnpr, and tidyr package, but nothing seems to work.

Do you have an idea what might cause this problem? Thanks in advance!

Best,

Jasper

privefl commented 9 months ago

So the error is with readBGI? Are you sure you need to provide a list there?

JasperHof commented 9 months ago

Hi Florian, thanks for the quick reply.

Ah, you are right that I should input a vector when using readBGI. However, that also did not work, using the same error. E.g., I tried:

bgi = snp_readBGI(bgifile, snp_id = "21_11098615_G_T")

Which again gave me the error "Error in format_snp_id(info_id) : Wrong format of some variants.". I get the same error when using snp_readBGEN.. so not sure what's going wrong

Thanks again,

Jasper

privefl commented 9 months ago

Okay, it seems the issue is actually with the SNP IDs derived from the BGI data.

Can you look at infos$myid, and maybe try bigsnpr:::format_snp_id on each of these until you find the one causing the issue.

JasperHof commented 9 months ago

Hi Florian,

Thanks again for the reply. I have run bigsnpr:::format_snp_id(infos$myid), but this returns all SNP ids (does not stop). For now I am just focusing on chr 21/22, so the first two characters are digits.

So... this is quite weird. When I am running this code, the first part works, but the final line does not:

snp_id = c("21_11098615_G_T", "21_14402660_C_T", "21_14444176_C_A")

function (snp_id) { snp_id <- ifelse(substr(snpid, 2, 2) == "", paste0("0", snp_id), snp_id) if (any(substr(snpid, 3, 3) != "")) stop("Wrong format of some variants.") snp_id }

bgi = snp_readBGI(bgifile, snp_id = snp_id) # Gives error: "Error in format_snp_id(info_id) : Wrong format of some variants"

So now, I believe there is a different cause to the problem. When loading the 'infos' variable using

infos <- dplyr::collect(dplyr::tbl(db_con, "Variant"))

Here, 'infos' does not contain my chromosome information, which is needed to match the SNPs. I entered this manually at first using 'infos$chromosome = chromosome', but I believe this has led to an error within the 'snp_readBGI' function. So, I will check how can I can get the ,bgi files to include the chromosome number :-).

Cheers,

Jasper

privefl commented 9 months ago

So, to make sure I understand correctly, the issue is that the BGI file does not provide any chromosome information?

JasperHof commented 9 months ago

Yes, I was unable to run readBGI/readBGEN, because my genotype/bgi files did not contain information on the chromosome number. So now, i included '-assume-chromosome' in qctool, and now the chromosome number is included. The readBGI/readBGEN functions worked for me after doing this.

Thanks again for the quick replies, maybe this can help a future user. Best,

Jasper