hammerlab / prohlatype

Probabilistic HLA typing
Apache License 2.0
35 stars 4 forks source link

Try out on long reads #61

Open hammer opened 7 years ago

hammer commented 7 years ago

Oxford Nanopore

PacBio

hammer commented 7 years ago

From Bobby Sebra in relation to getting PacBio data:

We now regularly run the PacBio HLA protocol for about the past 9 months so this isn’t hard to do, especially for Class I. We can also call the haplotypes with software we use but you are welcome to do so with yours as well and you’ll see that you will match Histo’s results and/or discover novel features, depending on if Histo used their Pacbio protocol or their NGS commercial protocol. We’ve done this comparison maybe 1-1.5 years ago and saw the concordance.

We do Class II as well but as you probably know, class II often fail in some exons due to the incorrect haplotypic referencing. This has plagued the space of HLA screening for ages since not enough hapotypes are represented in the public data bases and hence the primers, etc etc.

We do Class I and II exons but no assay actually does the entire HLA locus unless we do capture based MHC / HLA assays which we can do with Roche but the cost is like 2 orders of magnitude higher per sample since you’re then doing a de novo capture and assembly of the entire 3.4Mbp HLA region. It’s do-able and fairly easy if we have the $ but HLA sequencing typically means Class I or II exons.

No HLA kits release primers since that’s the bread and butter of any of them molecular work so they are always held back. On the capture based, it’s just basic biotinylation chemistry like most capture kits and you’re paying for the kit. The assembly and informatics part is actually very easy for HLA if doing de novo. So it’s just the manufacturing of the probes that costs more money and inflates the price. More genetic loci, more $$$ versus the standard HLA targeted kits which focus on the genotypic exons and not introns etc. You will always get raw reads when we sequence and can use your own pipe for sure, which is of course a less biased way to discover novel haplotypes versus biased commercial look up tables and reference based methods.

We have our own primers in addition to some commercial ones but for HLA Class II no one has quite figured out the “perfect mixture” to capture all haplotypes because there aren’t many good PacBio references across various haplotypes so you end up using NGS data that gives you medium resolution and obviously miss a lot of individuals. So that’s all I mean but it’s an incomplete solution both from our own primers and from the standpoint of everyone else’s. What we’d need to do is blow $50-100k doing it the expensive way across dozens of haplotypes (or hundreds) and then use that data to inform the new set of primers and then synthesize those and make a novel kit using only PacBio data, etc. But yeah, it’s do-able and something we’re working on but requires so much cash and also the right samples and not just a random subsampling. It’s horrible how biased “human sequencing” data is and that doesn’t help in this case, so it’s more about getting the more diverse reference materials and human DNAs.