Open xiangzhu opened 6 years ago
I agree that returning some kind of info about SNPs would be useful. I don't think it's useful or necessary to enforce that though. One thing that would be quick and easy would be to add colnames
and rownames
to the LD matrix that match the colnames
of the input SNP matrix, that way the user has the option of getting back SNP information, but doesn't need to make up fake SNP information if they don't have any (which comes up pretty often)
@CreRecombinase yes, colnames
and rownames
seem to be sufficient in most cases.
Totally agree the following:
that way the user has the option of getting back SNP information, but doesn't need to make up fake SNP information if they don't have any (which comes up pretty often)
LDshrink
doesn't have to give a snp_info
when users don't have any.
There is one use case that having snp_info
seems necessary. Suppose one analyst needs to analyze GWAS summary data of two traits together with LD estimates. For many SNPs, the ALT
and REF
alleles are different between the two traits. To properly flip the sign of betahat
and/or LD estimates, we need the ALT
and REF
info.
However, this won't be necessary if the analyst has already unified the ALT
and REF
of all GWAS summary data files before using LDshrink
.
Finally, I think emeraLD
can easily pull out snp_info
because it uses vcf
as input, and vcf
already contains snp_info
.
It seems the main function only returns an estimated LD matrix at this point? https://github.com/stephenslab/LDshrink/blob/32b4ad3942f7cb429f23c529b86ab72cfbb1b257/R/LDshrink.R#L6
Ideally we want to have some basic SNP info available (e.g. position, allele), which is essential in combining LD with GWAS summary statistics in analyses.
I think the
emeraLD
package gives us a good example: https://github.com/statgen/emeraLD