Closed pmk2020 closed 3 years ago
Please look at parameter match.min.prop
in the documentation.
It is probably that the positions are not in the same genome build.
You can either convert them to another build using snp_modifyBuild()
or use rsids by setting join_by_pos = FALSE
instead if they are available in the two datasets (prefer the first option).
How could the documentation be improved to help explaining this?
Thank you, Florian. I have a follow up question.
How much minimum proportion match is good to run LDpred2 on? And would it be ok to have different match.min.prop for each chromosome?
With good sumstats and HM3 variants, you should get at least 70% I guess. The default is 50%.
and it would depend on the chromosome too, right? because I am able to read at 10% matching for chromosome 22
No, I guess it should not.
Have you tried with rsIDs instead? Do you know in which builds are the positions?
Ahh so then maybe its not well-tailored with HM3 variants because 10% matching is quite low, I feel.
They are in build37. I dont have rsids for all SNPs in summary stats so I have not tried it.
Okhay this maybe a basic question, but the genotype file to use for LDPred2, can it be my own genetic data file or does it have to be the genetic file provided by you?
No, you can compute the correlation matrix with whatever data you want, as long as it is not too small, and the ancestry matches the ancestry of the GWAS data from which you got the sumstats.
So do you think 10% minimum matching proportion should be ok to run LDpred2?
I don't think 10% is okay at all. Do you get this with the data you have or with the HM3 data I provide?
With data I have, but it is UK biobank chromosome 22 QC'ed data
This is the code:
info_snp <- snp_match(sumstats, map, match.min.prop = 0.1) 10,707,059 variants to be matched. 19,331 ambiguous SNPs have been removed. Some duplicates were removed. 117,501 variants have been matched; 0 were flipped and 102,410 were reversed.
Okay, I didn't get it, you have only chromosome 22 here... Yeah, 100K variants matched is even too large. You would get at least 400K for the largest chromosomes, and you cannot really run LDpred2 with that many variants. I would recommend restricting to HM3.
Yes, I only have chromosome 22 right now here.
Just to confirm with HM3 you mean the hm3_variants.rds (in the data-raw) folder, right?
No, I mean the info
object in the LDpred2 tutorial at https://privefl.github.io/bigsnpr/articles/LDpred2.html.
Thank you, Florian! I will use map.rds for my analysis. Hopefully the matching proportion is not that low then.
Hi Florian, I had a quick question while using the info object. I am following the code: example-with-provided-ldref.R code to run the analyses.
In the underneath code, where does the map_test come from?
# Here, you also want to restrict to the variants present in your test data as well. For this, you can use something like
in_test <- vctrs::vec_in(df_beta[, c("chr", "pos")], map_test[, c("chr", "pos")]) df_beta <- df_beta[in_test, ]
Also if I am running "auto" , you mentioned that it does not require any validation set in the manuscript, so I am wondering why should there be a test set in above code? I maybe misunderstanding the underlying aspect, therefore any help is much appreciated.
Thank you.
The test set is the set of individuals for which you want to derive the polygenic scores. You need to have the same variants in this set as the ones you used as input for LDpred. Otherwise, you'll get weights for variants you do not have!
Sorry to bother you again, Florian! If I am using map.rds (ld-reference dataset), do I still need to provide an individual level dataset in order to run pred_inf/grid/auto? Since these functions require 'G' which is not present in the map.rds?
Thank you!
Yes, you need a test set, right? What are you using the PGS for?
My test set would be UKBB, which during snp_match() may create the same problem (as above), right?
I am using it for outcome-related phenotypes (e.g. survival)
If it's UKBB, you're fine. All the variants will be available as the LD reference was made from UKBB variants.
Thank you, Florian!
Hi Florian, Firstly, this is a great package and I am excited to use it. I came across the following error while using snp_match()
10,707,059 variants to be matched. 20,946 ambiguous SNPs have been removed. Some duplicates were removed. 122,195 variants have been matched; 0 were flipped and 103,812 were reversed. Error: Not enough variants have been matched.
Not sure, why I am receiving this error message. Any help is appreciated :) Thank you.