jzook / testHG008curation

This repository is primarily for the Genome in a Bottle Consortium to curate structural variants in HG008
https://www.nist.gov/programs-projects/cancer-genome-bottle
1 stars 0 forks source link

[chr11:68232231-68233056][BND][JoinedTo=chr3:192188390[] #230

Open jzook opened 2 weeks ago

jzook commented 2 weeks ago

chr11 68232231 Minda_18 N N[chr3:192188390[ . PASS SVLEN=824;SVTYPE=BND;SUPP_VEC=ILL_MantaBND:74046:1:2:0:0:0:0,ILL_gridss187ff_7736h,PB_severus_INV1172,PB_ID_48947_2,PB_Sniffles2.INV.245MA,ILL_94946437:2

https://v2.genomeribbon.com/?session=https://42basepairs.com/download/s3/giab-data/ribbon-json/ribbon-hg008-cov-bedpe.json&locus=chr11:68232231#ribbon

aganezov commented 1 week ago

This is a complex somatic variant (or a pairs of variants, if paths are enumerated), which can be best described as the following paths:

  1. chr3: ... - 192,188,390 (+) -> chr11: 68,233,077 - ... (+)
  2. chr11: ... - 68,232,234 (+) -> chr11: 68,232,229 - 68,233,058 (-) -> chr3:192,188,394 - ... (+)

these paths are captured by the haplotype-resolved assembly.

First, the first path breakpoints of chr3:192,188,390 and chr11:68,233,077 are captured by PB, ONT, Element, and assembly alignments. There are no ambiguities there. On chr3 there is apparent LOH event (see in attached screenshot the heterozygous deletion in Normal, and its transition to homozygous (i.e., all read covering) in Tumor for both ONT and PB (though partially erroneous haplotagging in PB tumor data can be observed). Assembly also suggests LOH in that region.

Screenshot 2024-10-08 at 14 39 55 Screenshot 2024-10-08 at 14 38 53

Second, the second path is comprised of 3 segments, two from chr11 (with inverted and 4 bp overlapping pair first) and a chr3 segment last.

It appears that there was an artifact for ONT read alignment which lead to the inversion to be aligned straight through with lower match quality based on the forward strand sequence. Remapping ONT reads with the same preset as PB reads leads to identical rearranged genome structure with the inverted segment being preset in both alignments (see issue_230.ONT.remap.sort.bam track for remapped ONT reads). See the following screenshot for the difference between the same read's alignments between setups. Screenshot 2024-10-08 at 19 26 25

Starting on the right, chr3 breakpoint of 192,188,394 is supported across all sequencing technologies as well as the assembly. Screenshot 2024-10-08 at 14 51 41

It's pair breakend location is chr11:68,232,229 is supported by Element, remapped ONT, and Assembly. PB shows the 68,232,231 breakpoint location. Screenshot 2024-10-08 at 19 16 56

Continuing via the second path, we go to the other end of the inverted segment, where the breakend chr11:68,233,058 is supported by Element, ONT (remapped), and assembly. PB shows a shifted breakend location of chr11:68,233,055. Screenshot 2024-10-08 at 19 19 06

It's pair breakend is located on chr11:68,232,234 and is supported by PB, ONT (remapped), Element, and Assembly. Screenshot 2024-10-08 at 19 20 57

It appears that all involved SV signatures (i.e., pairs of breakends) affect only a single haplotype (HP2 for chr11 breakends) and affect all the reads on chr3, which shows signs of LOH, thus affecting all reads on the remaining haplotype. This suggests that all the described SVs are clonal.

jzook commented 1 week ago

this is related to #171 and #172 that @aysegokce curated (see additional screenshots there). Because of the inversion at the breakpoint on chr11, this is complex and we'll have to decide whether we include this in the initial benchmark, and if so, how to represent it