privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Error when mapping file: Invalid argument. #488

Closed privefl closed 3 months ago

privefl commented 3 months ago

I do get an error at a later step (and I do not change anything to the original script, besides changing the pathname):

for (chr in 1:22) { 
   cat(chr, ".. ", sep = "")   
   ## indices in 'df_beta'
   ind.chr <- which(df_beta$chr == chr)
   ## indices in 'map_ldref'
   ind.chr2 <- df_beta$`_NUM_ID_`[ind.chr]
   ## indices in 'corr_chr'
   ind.chr3 <- match(ind.chr2, which(map_ldref$chr == chr))

   corr_chr <- readRDS(paste0("ldref_hm3_plus/LD_with_blocks_chr", chr, ".rds"))[ind.chr3, ind.chr3]

   if (chr == 1) {
     corr <- as_SFBM(corr_chr, tmp, compact = TRUE)
   } else {
     corr$add_columns(corr_chr, nrow(corr))
   }
 }
1.. Error: Error when mapping file:
  Invalid argument.

This is an error I have never encountered.

Thanks!

Hannae

Originally posted by @alhannae in https://github.com/privefl/bigsnpr/issues/487#issuecomment-2007336719

privefl commented 3 months ago

@alhannae What are you passing as tmp?

alhannae commented 3 months ago

tmp <- tempfile(tmpdir = "tmp-data")

privefl commented 3 months ago

Do you have an existing directory called "tmp-data"?

alhannae commented 3 months ago

Hi Florian,

Yes it already existed. But I didn't think anything of it, since other times it never gave an error. I created a new directory and that seemed to fix it.

I further had no problem running the exact code, up to the point where PRS are being calculated.

I attached public-data3 from your other tutorial since I could not find the UKBB data you refer to. However, I get this error when running the big_prodVec function:

ukb <- snp_attach("tmp-data/public-data3.rds") G <- ukb$genotypes map <- dplyr::transmute(ukb$map, chr = as.integer(chromosome), pos = physical.pos, a0 = allele1, a1 = allele2)
map_pgs <- df_beta[1:4]; map_pgs$beta <- 1 map_pgs2 <- snp_match(map_pgs, map)

pred_auto <- big_prodVec(G, beta_auto * map_pgs2$beta, ind.col = map_pgs2[["_NUMID"]], ncores = NCORES)

Error: Incompatibility between dimensions. 'y.col' and 'ind.col' should have the same length. In addition: Warning message: In beta_auto * map_pgs2$beta : longer object length is not a multiple of shorter object length

I calculated betas for all the variants in common between sumstats and hapmap. I think that this is why it errors. Is there any way to make this work without restricting to the variants in the test dataset? I would want to have adjusted betas for all the variants (in stead of restricting to the variants in one dataset) so that I then can use them on different datasets without losing any SNPs.

Also, I noticed that in the 'map_pgs' file the beta's are all '1' while in the map_pgs2 (when doing the matching) those beta's convert to '-1'. Is that supposed to happen?

Thanks again!

Hannae

privefl commented 3 months ago

Please