samtools / hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats
http://samtools.github.io/hts-specs/
655 stars 173 forks source link

Clarify the relationship between CNV and DEL/DUP structural variation types #548

Open tskir opened 3 years ago

tskir commented 3 years ago

Comments extracted from discussions (1, 2) in #465

@cwhelan

I might add that there's also the option to use <CNV> for a duplication if you know it exists but aren't sure if it's tandem or not, or if you know that it's not tandem but aren't sure about the breakend structure.


@jmmut Maybe this is a silly question, but what does "multiallelic" mean here? The same as "may be both deletion and duplication" to say that both copynumber 0 and greater than 1 is allowed? "multiallelic" doesn't look clear to me.

@cwhelan Multiallelic could mean either the presence of both a simple deletion and duplication allele for the segment, or that there are duplication alleles with differing numbers of copies. For example, one allele might have a single tandem duplication of the segment (and therefore two copies of the reference segment) and another allele might have three or more copies. See for example https://www.ncbi.nlm.nih.gov/pubmed/25621458.

@tskir I mostly agree with @cwhelan's definition, although this needs to be reflected in the spec itself (not just this discussion)

mbaudis commented 3 years ago

I might add that there's also the option to use for a duplication if you know it exists but aren't sure if it's tandem or not, or if you know that it's not tandem but aren't sure about the breakend structure.

No, DUP if you know directionality, whatever breakend or how many alleles are involved, as long as it results in >2 (or better >referenceAlleleCount) alleles in total, even w/ LOH. A case of phased duplication of 1 allele w/ loss of the other OTOH would not represent a DUP, unless e.g. the replacing allele carries a sequence elongation (e.g. TANDEM).

DUP: Net gain of allelic count between 2 positions without need of knowledge about their placement https://github.com/samtools/hts-specs/pull/465#discussion_r580265855

... though I would have no problem w/ types of CNV:DUP, CNV:DEL or such.

d-cameron commented 3 years ago

@mbaudis Doesn't the interpretation of the meaning of DUP depend on what is defined in the sample GT field? I would interpret <DUP>,<DEL> SVCLAIM=CN GT=1/2 as a copy number neutral LOH, but a <DUP>,<DEL> SVCLAIM=CN GT=1 as a copy number gain with no allele-specific copy number breakdown (although the alternative interpretation is a copy number gain with LOH since only one haplotype is defined).

<DUP> without any GT information is either a claim of overall copy number gain, or a claim of a copy number increase of at least one allele. If SVCLAIM=BP then it is clearly the latter, but it's unclear what it means when SVCLAIM=CN. I presume copy number callers generally use the former interpretation, correct?

d-cameron commented 2 years ago

either a claim of overall copy number gain, or a claim of a copy number increase of at least one allele

4.4 has INFO CN = ASCN and FORMAT CN = overall CN.