jzook / testHG008curation

This repository is primarily for the Genome in a Bottle Consortium to curate structural variants in HG008
https://www.nist.gov/programs-projects/cancer-genome-bottle
1 stars 0 forks source link

[chr5:4450315-4450851][INS][SVLEN=536bp] #178

Open jzook opened 1 month ago

jzook commented 1 month ago

chr5 4450315 Minda_108 N <INS> . PASS SVLEN=536;SVTYPE=INS;SUPP_VEC=ONT_severus_INS18415,ONT_ID_30843_1,ONT_i_292,PB_severus_INS16227,PB_Sniffles2.INS.C0M4,PB_i_708,ONT_Sniffles2.INS.BDM4

https://v2.genomeribbon.com/?session=https://42basepairs.com/download/s3/giab-data/ribbon-json/ribbon-hg008-cov-bedpe.json&locus=chr5:4450315#ribbon

mikolmogorov commented 1 month ago
Screenshot 2024-09-30 at 4 33 23 PM
jo-mc commented 1 month ago

Looks like a line-1 insertion, Has 5' end of L1 (228bp) some transduced sequence and some unknown/unrelated sequence (between poly-A's, matching to chr15:82350094 (chm13). Also has a 5' inversion. Target site duplication: TGAAAGTAGGCATATC

Originates from chrX:11517710  (chm13)  which has a full length L1HS nearby. (high probablility,chr6 is ~possible)

ins178 TGAAAGTAGGCATATCGTACACAAATTTGATGAGTTTTGACACATGCAAACAGCCATGAAACCATCATAACAATTAAAATAACAAACATTTCCATGTTTGCTTTGTTTTGTTTGTTTTTTTTTTTTTTTTCATGTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTATACTCTAAGTTTTAGGGTACATGTGCACATTGTGCAGGTTAGTTACATATGTATACATGTGCCATGCTTGTGCGCTGCACCCACTAATGTGTCATCTAGCATTAGGTATATCTCCCAATGCTATCCCTCCCCCCTCCCCCGACCCCACCACAGTCCCCAGAGTGTGATATTCCCCTTCCTGTGTCCATGTAAATTTGTGTACGTTAAATATGTGAAACTTATTGTATGCTGGTTACACCTCAATAAAGCTGTTAAATTTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAGATGTAAGTAGAAATAGCAAAAAGTTAAAAAGCAGGACAAATTAAAATAGAGTTTTTATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Primary mappings, part 1 and part 2 are consecutive on chrX with part 2 inverted. part 3 is a mystery, so far: part1_pos1-366 0 chrX 11517710 1 15S95M1I18M2I5M9I221M
part2_pos366-460 16 chrX 11517633 1 21M4I69M
part3_461-572 0 chr15 82350092 10 3S63M44S

Secondary mappings part1 272 chr6 71196943 0 194M18I38M3I98M15S
part2 256 chr6 71197260 0 76M18S 0
part2 256 chr7 17184621 0 66M4I21M3S

 

jo-mc commented 1 month ago

If we examine reads from both tumor and normal samples, we observe a full-length Line-1 insertion. This structural variant (SV) is not present in the reference genome but is detected in both tumor and normal samples, located at chr15:82350092. Its heterozygous state suggests it's either a de novo mutation or inherited from a parent. Tracking the transduction from this insertion event, we find it most likely originates from a reference-annotated L1HS at chrX:11517150. This insertion (576bp) is observed at chr5:4377921. The source of this insertion is the non-reference Line-1 insertion at chr15:82350092, which itself is a full-length LINE-1 plus a transduction from chrX.

This chr5 insertion represents a truncated LINE-1 retrotransposon that includes additional transduced sequence from the source non-reference LINE-1 on chr15.

(all genome coords above are for chm13v2)

Adjustment: remove tag tandem repeat The spreadsheet possibly has wrong information (or maybe a new SV)? should be INS 536bp

spreadsheet row 33:
chr5 | 4450329 | T | <DEL> | DEL | -19920227 | 24370556 | 0

minda:
chr5    4450315 Minda_108       N       <INS>   .       PASS    SVLEN=536;SVTYPE=INS;SUPP_VEC=O
jzook commented 1 month ago

Interesting that this insertion comes from a germline L1 insertion relative to the ref on chr15! I added a new line with the 589bp inserted sequence from the assembly, which seems roughly to represent the HiFi reads (better than minda) though they are variable in length

GAAAGTAGGCATATCGTACACAAATTTGATGAGTTTTGACACATGCAAACAGCCATGAAACCATCATAACAATTAAAATAACAAACATTTCCATGTTTGCTTTGTTTTGTTTGTTTTTTTTTTTTTTTTCATGTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTATACTCTAAGTTTTAGGGTACATGTGCACATTGTGCAGGTTAGTTACATATGTATACATGTGCCATGCTTGTGCGCTGCACCCACTAATGTGTCATCTAGCATTAGGTATATCTCCCAATGCTATCCCTCCCCCCTCCCCCGACCCCACCACAGTCCCCAGAGTGTGATATTCCCCTTCCTGTGTCCATGTAAATTTGTGTACGTTAAATATGTGAAACTTATTGTATGCTGGTTACACCTCAATAAAGCTGTTAAATTTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAGATGTAAGTAGAAATAGCAAAAAGTTAAAAAGCAGGACAAATTAAAATAGAGTTTTTATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA