Closed uki-uiu closed 6 months ago
You need to ask for those files to be put on your server.
This has been discussed in other issues here.
BTW, dir
is for the directory, not the full path.
Otherwise, you can use something like 3MB window, identify nearly-independent LD blocks from that, and then re-compute all the values within the LD blocks (by using something like POS2 <- block_id, and size = 1e-4); this will probably give you the best LD matrix.
Hello! Thank you for getting back to me!
I decided to limit my analysis to only the HapMap SNPs as described in your tutorial and utilize the genetic distances available from https://github.com/joepickrell/1000-genomes-genetic-maps/tree/master/interpolated_from_hapmap
to avoid the use of snp_asGeneticPos function.
I input the genetic distances directly (DIST= genetic positions derived from the link above and df_beta<-info.pos)). And used this within the loop
pos2_table<-df_beta[df_beta$chr==chr]
POS2<-sort(pos2_table$DIST)
and
corr0<-snp_cor(G,ind.col=ind.ch2,size=3/1000, infos.pos=POS2,ncores=NCORES)
Generates a 15826x15826 correlation matrix without any errors and warnings.
The rest of the script runs smoothly except at the end when I perform the scoring, all the participants end up with "NA" scores (pred_auto results in all NA values and therefore I cannot create a model at the end). I have checked other LDPred issues on Github cant find one which is a similar situation as mine.
Do you see any reason I am getting this error?
Please advise.
There are several issues like this here. Basically, if you have NAs in the polygenic scores, it means you either have NAs in the effects you get from LDpred2-auto, or you have NAs in the genotype matrix that you use to compute the polygenic scores.
Thank you Dr.Privé, I will look into it!
Any update on this?
I managed to solve the issue after going through the the other issues/solutions posted. Thank you
Could you quickly summarize your solution for others?
And then close the issue, if there is nothing else on this.
I ended up using the Pred_grid option and imputed any missing genotypes
G2<-snp_fastImputeSimple(G,method="mean2")
And to avoid the issues I was facing using the script for scoring the test-set, I extracted the SNPs from the PGS (with the best parameters after tuning with the validation cohort) created in the earlier steps `` I then used another tool to score the participants in the test set.
i am trying to run LDpred2 using hg38 genome-wide data offline however the script does not allow me to select the latest genetic map coordinates for this purpose;
POS2 <- snp_asGeneticPos(CHR, POS, dir = "/filepath/genetic_map_hg38_withX.txt.gz").
This step only works with internet access and repeatedly tries to access the OMNI files from the hg19 build (chromosome separated). Is there a work around for this step?
I will be using an LD reference based on the local dataset so I do not need the hapMap data but will need to calculate the correlation values from the matrix in this step.