Closed xuxwen closed 1 year ago
There are some divergence issues. Did you perform some proper QC on the sumstats?
If yes, then try use_MLE = FALSE
, and decreasing shrink_corr
(e.g. 0.5-0.8).
Yes, the eQTL data I used myself had MAF > 1% for all SNP information
There are some divergence issues. Did you perform some proper QC on the sumstats?
If yes, then try
use_MLE = FALSE
, and decreasingshrink_corr
(e.g. 0.5-0.8).
But I would still like to know which specific steps of QC you are referring to on your side?
Thanks!Florian! Yes, our sumstats results have all been QC. But now there is still a new problem, we are using OncoArray GWAS data 41633 individuals to predict the outcome of individuals. But pred_autored_auto_red_auto_sparse results found that h many individuals were not predicted to be 0 (they should be if the individual results are meaningless) and many null values were found. The following images are pred_auto_sparse results:
0s or NAs? Do you have missing values in your data (genotypes) that appear when you calculate the polygenic scores (by the product of genotypes x effect sizes)?
0s or NAs? Do you have missing values in your data (genotypes) that appear when you calculate the polygenic scores (by the product of genotypes x effect sizes)?
Our oncoarray data is already imputed in by TOPMED, so I'll have to take a closer look to see if there are any missing values, and I'll get back to you later! Please forgive me!
0s or NAs? Do you have missing values in your data (genotypes) that appear when you calculate the polygenic scores (by the product of genotypes x effect sizes)?
Oncoarray data for genotypes with no missing values! I would like to ask if multiple isogenic loci will affect the j results?
Could you check in R (before writing to the file)?
Could you check in R (before writing to the file)?
There should be no problem, my fellow researcher's methylation site construction PRS uses the same code, uses the same auto model but the pred_auto as well as pred_auto_sparse in the output just shows the predicted values for each individual, I just can't understand it!
Without more information, I can't help you more.
Without more information, I can't help you more.
What more information is needed? I'll provide as much as I can and am very keen to address this!
I suspect these blank lines in the file are missing values, not 0s. Can you check in R in the vector that you write to the file?
Then, if these are missing values, you need to check whether these come from the PGS effect sizes (not likely), or the genotype matrix (most likely).
I suspect these blank lines in the file are missing values, not 0s. Can you check in R in the vector that you write to the file?
Then, if these are missing values, you need to check whether these come from the PGS effect sizes (not likely), or the genotype matrix (most likely).
I have checked the read file in R and confirmed that the blank lines are missing values (NA) and not 0. And I have also checked the genotype matrix and was able to confirm that there are no missing values!
If you have missing values when you compute the PGS, it must be that you have missing values in the genotype matrix.
How did you check there was no missing value? You can e.g. use big_counts()
.
No missing values in the effect sizes as well?
If you have missing values when you compute the PGS, it must be that you have missing values in the genotype matrix. How did you check there was no missing value? You can e.g. use
big_counts()
. No missing values in the effect sizes as well?
Thanks again! I followed your advice and used sum(big_counts(G)[4, ]) and the final return value is 514272579, what does this step mean? What does the return value represent?
big_counts(G)[4, ]
should give you a vector with the number of missing values for each variant.
big_counts(G)[4, ]
should give you a vector with the number of missing values for each variant.
Thank you very much! Again, we looked carefully to see that there were indeed missing values, Oncoarray, a total of 18,610,000 variants geno0.05 about 30,000 variants missing! This is most likely because we were too conservative when we first filled the TOPMED considering that onco was a merger of two samples! Thanks again for your careful answer! I'd like to make a final enquiry here about how long it would take me to fill in 18.6 million variants d using G2 <- snp_fastImputeSimple(G,method = c("mean2"))! Thanks again! I wish you all the best!
Depends on the number of samples I guess, but it might take a while. Count maybe 1 hour / 100GB of the data.
Note that you only need the variants that were used in LDpred2, so maybe you could have read only those to the bigSNP format.
Make sure these variants do not have too many missing values (< 10%) before using snp_fastImputeSimple()
.
Any update on this?
HeHi Florian!
I am attempting to computer some PRS weights using LDpred2-auto with the snp_ldpred2_auto() function. I often get this error when I execute the following code: multi_auto <- snp_ldpred2_auto( corr, df_beta, h2_init = h2_est, vec_p_init = seq_log(1e-4, 0.2, length.out = 30), ncores = NCORES, burn_in = 500, num_iter = 500, report_step = 20,
use_MLE = FALSE, # uncomment if you have convergence issues (need v1.11.9)
range <- sapply(multi_auto, function(auto) diff(range(auto$corr_est))) keep <- which(range > (0.95 * quantile(range, 0.95, na.rm = TRUE)) & range < 3) beta_auto <- rowMeans(sapply(multi_auto[keep], function(auto) auto$best_est))
If you check my range, you will see that the values are all NA! What could be the problem?