Open tskir opened 3 years ago
I might add that there's also the option to use
for a duplication if you know it exists but aren't sure if it's tandem or not, or if you know that it's not tandem but aren't sure about the breakend structure.
No, DUP
if you know directionality, whatever breakend or how many alleles are involved, as long as it results in >2 (or better >referenceAlleleCount) alleles in total, even w/ LOH. A case of phased duplication of 1 allele w/ loss of the other OTOH would not represent a DUP, unless e.g. the replacing allele carries a sequence elongation (e.g. TANDEM).
DUP: Net gain of allelic count between 2 positions without need of knowledge about their placement https://github.com/samtools/hts-specs/pull/465#discussion_r580265855
... though I would have no problem w/ types of CNV:DUP
, CNV:DEL
or such.
@mbaudis Doesn't the interpretation of the meaning of DUP depend on what is defined in the sample GT field? I would interpret <DUP>,<DEL> SVCLAIM=CN GT=1/2
as a copy number neutral LOH, but a <DUP>,<DEL> SVCLAIM=CN GT=1
as a copy number gain with no allele-specific copy number breakdown (although the alternative interpretation is a copy number gain with LOH since only one haplotype is defined).
<DUP>
without any GT
information is either a claim of overall copy number gain, or a claim of a copy number increase of at least one allele. If SVCLAIM=BP then it is clearly the latter, but it's unclear what it means when SVCLAIM=CN. I presume copy number callers generally use the former interpretation, correct?
either a claim of overall copy number gain, or a claim of a copy number increase of at least one allele
4.4 has INFO CN = ASCN and FORMAT CN = overall CN.
Comments extracted from discussions (1, 2) in #465
@cwhelan
@jmmut Maybe this is a silly question, but what does "multiallelic" mean here? The same as "may be both deletion and duplication" to say that both copynumber 0 and greater than 1 is allowed? "multiallelic" doesn't look clear to me.
@cwhelan Multiallelic could mean either the presence of both a simple deletion and duplication allele for the segment, or that there are duplication alleles with differing numbers of copies. For example, one allele might have a single tandem duplication of the segment (and therefore two copies of the reference segment) and another allele might have three or more copies. See for example https://www.ncbi.nlm.nih.gov/pubmed/25621458.
@tskir I mostly agree with @cwhelan's definition, although this needs to be reflected in the spec itself (not just this discussion)