Closed henry-jo98 closed 1 year ago
Hi,
Negative variance estimates happen when there is an absence of genetic signal in the SNP summary statistics provided. LAVA essentially decomposes the variance observed in the data into the true genetic variance and noise due to sampling error, and thus is subtracting that noise from the total. This can therefore end up negative, if that true genetic variance is (close to) zero (it's very similar to the adjusted R2 in multiple linear regression).
In that sense there isn't something that can be done about it. Partially it is a power issue, if there is at least some genetic variance present in that locus then larger samples with better signal to noise ratio could still start to tap into it, but it may also be that there's just no genetic signal there at all, and the correlation just doesn't exist there either.
It is not impossible that the 400 SNPs difference has some effect here, in the sense that there might be stronger genetic signal with the phenotype in those SNPs that you would have picked up on had they been available. Given how strong local LD usually is though, in most cases I wouldn't expect that to lead to enormous differences in the variances estimates. After all, if there were strong associations in those 400, then through LD you would have expected a fair amount of that signal to also be reflected in the other SNPs in that region that you do have.
Good afternoon,
I am trying to perfom a bivariate analysis on a 1Mb loci and I get the following warning: Negative variance estimate for phenotype X in locus Y; Dropping these as they cannot be analysed. After inspection of the locus in my summary statistics files, I have noted a nearly 400 SNPs difference at this locus between phenotype that worked and ones that didn't. Could this difference in coverage explain the negative variance estimates? If not, what could ? Is there anything I can do about this issue?
Thank you very much for your response,
Regards,