Using --genome in PLINK to obtain IBD doesn't seem to work

Hi,

I tried using --genome in PLINK to calculate IBD based on a selection of SNPs (in my case 44). Turns out, that in my data:

7 SNPs have haploid genotypes
when merging GWAS data of the same study with the 450K samples - there should be an overlap of 451 samples - solely based on these 44 SNPs, the overlap is only about 90. That number matches with:
- I have 593 450K samples in total
- 95 are blood-derived
- 498 are tissue-derived
- 89 samples overlap between blood- and tissue-derived.

So, in other words, for now my conclusion is that you can't use these SNPs to correlate the genetic data to the epigenetic data as a form of QC to determine who's who. Or at least, not using PLINK's IBD with --genome.

What are the commands that you use to cluster samples based on this type of data? So, I mean the combined set of GWAS and 450K/EPIC data to determine the correlation between samples.

Best,

Sander

typically I just plot the samples -- e.g. for TARGET AML, I have several hundred runs from kids with leukemia, some on HM27, some on HM450, some on EPIC, and some from multiple time points (dx, rm, relapse) with or without duplicates. I haven't matched them up with SNP6 or Illumina WGS, but I could, and will probably do exactly that once this year's ASH abstracts are in.

For the NA12878 matching, I manually verified everything against the Genome-In-A-Bottle (GIAB) assembly (constructed from NA12878 and her parents NA12891 and NA12892) in high-confidence regions. The ones I couldn't verify are left blank.

This does not preclude differences in targets between what the oligos are supposed to bind (on SNP6 and/or Infinium) and what they actually do bind...

That said, I usually apply hierarchical clustering with Ward's method to line up samples and it usually works. Unless I'm looking for recombination or pedigree concordance, IBD/IBS isn't super important to me.

--t

On Thu, Jul 27, 2017 at 5:27 AM, Sander W. van der Laan < notifications@github.com> wrote:

Hi,

I tried using --genome in PLINK to calculate IBD based on a selection of SNPs (in my case 44). Turns out, that in my data:

7 SNPs have haploid genotypes

when merging GWAS data of the same study with the 450K samples - there should be an overlap of 451 samples - solely based on these 44 SNPs, the overlap is only about 90. That number matches with:

I have 593 450K samples in total

95 are blood-derived

498 are tissue-derived

89 samples overlap between blood- and tissue-derived.

So, in other words, for now my conclusion is that you can't use these SNPs to correlate the genetic data to the epigenetic data as a form of QC to determine who's who. Or at least, not using PLINK's IBD with --genome.

What are the commands that you use to cluster samples based on this type of data? So, I mean the combined set of GWAS and 450K/EPIC data to determine the correlation between samples.

Best,

Sander

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ttriche/infiniumSnps/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIgXMr_pv4QJA9sRPgAiI4Civ2Rpvks5sSIIhgaJpZM4OlNgI .

ttriche / infiniumSnps

Using --genome in PLINK to obtain IBD doesn't seem to work #2