jzook / testHG008curation

This repository is primarily for the Genome in a Bottle Consortium to curate structural variants in HG008
https://www.nist.gov/programs-projects/cancer-genome-bottle
1 stars 0 forks source link

[chr12:7380776-7380976][INS][SVLEN=200bp] #235

Open jzook opened 4 weeks ago

jzook commented 4 weeks ago

chr12 7380776 Minda_23 N <INS> . PASS SVLEN=200;SVTYPE=INS;SUPP_VEC=ONT_severus_INS11604,ONT_i_57,PB_severus_INS10140,PB_Sniffles2.INS.72MB,PB_i_142,PB_ID_57252_1,ONT_Sniffles2.INS.73MB

https://v2.genomeribbon.com/?session=https://42basepairs.com/download/s3/giab-data/ribbon-json/ribbon-hg008-cov-bedpe.json&locus=chr12:7380776#ribbon

jo-mc commented 3 weeks ago

Conclusion: Line1 truncated insertion. Appears heterozygous from IGV, REVIO, /HG008-T_PacBio-HiFi-Revio_20240125_116x_CHM13v2.0.bam. Varaition in poly A length (~10bp) present

Approx ~300bp downstream appears a small 28 bp duplication. HOWEVER it is also present in the PacBio Normal, it is NOT present in the Illumina data, /HG008-T_Illumina_195x_CHM13v2.0.bam (or normal illumina). (Could check the other pacbio data set?)

Dragen VCF confirms, but calls SOMATIC: PR:SR 83,0:79,0 78,20:75,22

chrom1: chr12 pos1: 7380775 strand1: - chrom2: chr12 pos2: 7380975 strand2: + variant_name: Minda_23 variant_type: INS split: 1 size: 200 CNV_category: partial category: simple nearby_variant_count: 0

splitthreader

image

CHM13v2 liftover: chr12:7394114-7394314

Samples insertion sequence:

m84039_240113_032943_s4/195957961/ccs read Start:7377626 flag: 0 Ins_Count: 1 inspos 7394115 GGGACTGTGGTGGGGTAGGGGGAGGGGGGAGGGATAGCATTGGGAGATATACCTAATGCTAGATGACACATTAGTGGGTGCAGCGCACCAGCATGGCACATGTATACATATGTAACTAACCTGCACAATGTGCACATGTACCCTAAAACTTAGAGTATAATAAAAAAAAAAAAAGAAAGAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Reference

chr12:7394094-7394134 tgttccctaaaaacctatgg aaataaaaaa CTTTTTAAAA

Insertion potentialy removed portion of center sequence sequence, at/prior insertion point ->CTTTTAAAA.

Insertions Align to L1HS, 3' end; as per Line1 truncated insertion, suggesting insertion due to LINE1 retrortransposition. No target site duplication present, which if so would add weight to Line1 insertion.:

m84039_240114_012401_s1/166662538/ccs 0 L1HS 5860 60 205M12S m84039_240113_032943_s4/78515397/ccs 0 L1HS 5860 60 140M1D64M10S
m84039_240113_032943_s4/60428294/ccs 0 L1HS 5860 60 205M12S
m84039_240114_012401_s1/54006952/ccs 0 L1HS 5860 60 205M9S
m84039_240113_032943_s4/205722724/ccs 0 L1HS 5860 60 205M12S
m84039_240113_032943_s4/213323044/ccs 0 L1HS 5860 60 205M13S
m84039_240113_032943_s4/138218943/ccs 0 L1HS 5860 60 205M17S
m84039_240113_032943_s4/168235307/ccs 0 L1HS 5860 60 205M11S
m84039_240114_012401_s1/102045215/ccs 0 L1HS 5860 60 205M13S
m84039_240113_032943_s4/195957961/ccs 0 L1HS 5860 60 205M11S
m84039_240113_032943_s4/234622120/ccs 0 L1HS 5860 60 205M11S
m84039_240114_012401_s1/126026272/ccs 0 L1HS 5860 60 30M1I38M1I87M2D48M13S m84039_240114_012401_s1/112856184/ccs 0 L1HS 5860 60 205M13S
m84039_240113_032943_s4/41156689/ccs 0 L1HS 5860 60 205M9S
m84039_240114_012401_s1/175182206/ccs 0 L1HS 5860 60 205M12S
m84039_240114_012401_s1/105514263/ccs 0 L1HS 5860 60 23M1I182M9S
m84039_240114_012401_s1/189464862/ccs 0 L1HS 5860 60 205M11S
m84039_240113_032943_s4/30670958/ccs 0 L1HS 5860 60 205M13S
m84039_240114_012401_s1/102371851/ccs 0 L1HS 5860 60 205M8S
m84039_240113_032943_s4/132516756/ccs 0 L1HS 5860 60 205M9S
m84039_240113_032943_s4/207487763/ccs 0 L1HS 5860 60 205M9S
m84039_240114_012401_s1/29100589/ccs 0 L1HS 5860 60 205M12S
m84039_240114_012401_s1/174391730/ccs 0 L1HS 5860 60 16M1D188M10S
m84039_240114_012401_s1/219091476/ccs 0 L1HS 5860 60 119M1D52M10I32M

Expect insertion to match multiple locations in genome.

BLAT 166 bp of sequence (not AAA...) (two 100% matches, generally a larger sequence is required for positive ID) YourSeq 166 1 166 166 100.0% chr2 - 159252193 159252358 166 YourSeq 166 1 166 166 100.0% chr22 + 29130583 29130748 166

These are both full length L1HS and likely capable of retrotranposition (ORFS not checked). As such these are likely source of the insertion. (Verification requires some transducted sequence along with the L1 insertion to positively identify source)

IGV Tumor insertion image

IGV Tumor insertion, concordant with SNP image

Illumina Added (no 28 bp duplication? ) image

However the 28 BP duplication is present in the Pacbio NORMAL - bottom panel. image