privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

I keep getting PGSs equal 0 #516

Open trinetn opened 3 hours ago

trinetn commented 3 hours ago

Dear Florian

I have been using LDpred2 to estimate PG-scores for a while now and I have been very happy with the tool. I am currently having an issue with an LDpred2 analysis. I hope you might be able to help us. Any help is much appreciated!!

I want to estimate PGSs using weights estimated with LDpred2 using a reference sample, but for some reason I keep getting PGSs equal 0. My code can be seen below:

#---------------------------------------------------------------------------------------------------------------------------------------------------------------------

#First I load my libraries
library(bigsnpr)
library(dplyr)

#Next I load my weights that I have previously estimated with a reference genome and a GWAS sumstats – I also load a table with information about which SNPs where used to estimate weights and which alleles were a1 – I think you often call this table df_beta
load("weights.RData")
df_beta <- read.table(file=”df_beta_table.txt”, header=T)

#I now load my big SNP object – I only have 3 individuals in this object, since I am still running this as test – The object was created with snp_readBed() and I have restricted my object to hapmap3 + SNPs
obj.bigSNP <- snp_attach("Test_object.rds")

#I define the G matrix and the map variants:
G   <- obj.bigSNP$genotypes
map <- dplyr::transmute(obj.bigSNP$map,
chr = chromosome, pos = physical.pos,a0 = allele2, a1 = allele1,rsid=marker.ID)

#I obtain pos, a1, a0, rsid and chr from the df_beta and create a beta
map_pgs <- df_beta[1:5]
map_pgs$beta <- 1

#I identify variants that were used to estimate the weights and create a beta that reflect a potential allele flip
map_pgs2 <- snp_match(map_pgs, map,join_by_pos=F)

#I  now run the last step, where I want to estimate my PGSs, but for some reason I keep getting PGSs equal 0 for each of my individuals
final_pred <- big_prodVec(G, final_beta_auto * map_pgs2$beta,
                         ind.col = map_pgs2[["_NUM_ID_"]],
                         ncores = NCORES)

#----------------------------------------------------------------------------------------------

Can you think of any thing that I am doing wrong? I would be grateful for any suggestions

Best wishes Trine

privefl commented 3 hours ago

So, final_pred is only 0s?

I would check two things:

trinetn commented 2 hours ago

Yes, final_pred is only 0s

The length of final_beta_auto is around 1,1 mio SNPs and the vector contains only non-zero values

privefl commented 2 hours ago

And G does not contain only 0s?

trinetn commented 2 hours ago

No, it contains 0s,1s and 2s for all three individuals

privefl commented 1 hour ago

Odd. Is this on GenomeDK, somewhere I can access? If you can save both final_beta_auto and map_pgs2 in some rds file, close to "Test_object.rds" somewhere I can access, I can have a look at it.

trinetn commented 32 minutes ago

Yes, I am working in the closed zone Brain on genomeDK, but I can´t share data with you - it is data from the ABCD cohort. I will try with a new target sample (a dataset from the Plink webpage) and see if I get the same results. I will write back when I have run the new analysis. Thank you for now