privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

I am trying to use the snp_readBGEN() function to read BGEN files. I see the below errors for different SNP lists. #285

Closed umakann-db closed 2 years ago

umakann-db commented 2 years ago

I am trying to use the snp_readBGEN() function to read BGEN files. I see the below errors for different SNP lists.

Error1: RserveException: eval failed

Code snippet used for Error1

list_snp_id <- list("21_10460657_A_G")

rds <- bigsnpr::snp_readBGEN(
    bgenfiles   = "/databricks/driver/simulate_50000_samples_1000_simulated.bgen",
    list_snp_id = list_snp_id,
    backingfile = "/databricks/driver/test1"
  )

Error2 : Some variants have not been found (stored in '/databricks/driver/simulate_50000_samples_1000_simulated_not_found.rds'). Some( Error: Some variants have not been found (stored in '/databricks/driver/simulate_50000_samples_1000_simulated_not_found.rds'). ) Error: Some variants have not been found (stored in '/databricks/driver/simulate_50000_samples_1000_simulated_not_found.rds').

list_snp_id <- list("21_39870309_G_A")

rds <- bigsnpr::snp_readBGEN(
    bgenfiles   = "/databricks/driver/simulate_50000_samples_1000_simulated.bgen",
    list_snp_id = list_snp_id,
    backingfile = "/databricks/driver/test1"
  )

Some variants from bgen file.

Welcome to bgenix
(version: 1.1.7, revision )

(C) 2009-2017 University of Oxford

Building query                                              :  (0/?,0.0s,0.0/s)
Building query                                              :  (467/?,0.0s,572765.1/s)
# bgenix: started 2021-12-16 04:14:53
alternate_ids   rsid    chromosome  position    number_of_alleles   first_allele    alternative_alleles
rs75434219  .   21  9671019 2   T   C
rs4117812   .   21  10460657    2   A   G
rs111709978 .   21  10794845    2   AGAGTG  A
rs28972678  .   21  10813206    2   C   G
rs28970553  .   21  10828213    2   T   C
rs28972302  .   21  10841966    2   C   G
rs71326366  .   21  10966804    2   A   ATGTG
rs2479471   .   21  11033803    2   G   A
rs57224481  .   21  11045982    2   CA  C

Any advice on this issue? Thank you very much.

Originally posted by @umakann-db in https://github.com/privefl/bigsnpr/issues/97#issuecomment-996121535

privefl commented 2 years ago

I have never seen the first error before.

For the second one, it is usually a matter of swapping the alleles.

Please give a bit more info on when do you get the first error. What type of architecture / installation you're running this on.

umakann-db commented 2 years ago

I am seeing this error every time I run the below code.

list_snp_id <- list("21_10460657_A_G")

rds <- bigsnpr::snp_readBGEN(
    bgenfiles   = "/databricks/driver/simulate_50000_samples_1000_simulated.bgen",
    list_snp_id = list_snp_id,
    backingfile = "/databricks/driver/test1"
  )

Platform: Databricks Databricks Runtime Version: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12) Docker Image: projectglow/databricks-glow:1.1.2

privefl commented 2 years ago

Could you run debugonce to tell me exactly which line of the code inside the function is failing?

umakann-db commented 2 years ago

Thank you so much for your response. Here's the actual error from the command.

terminate called after throwing an instance of 'Rcpp::exception'
  what():  Probabilities should be stored using 8 bits.

Changed the number of bits used to represent each probability value to 8 and it's working fine.